[2602.22242] Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

[2602.22242] Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

arXiv - AI 3 min read Article

Summary

This paper analyzes the vulnerabilities of Large Language Models (LLMs) to prompt injection and jailbreak attacks, evaluating various defense mechanisms across multiple models.

Why It Matters

As LLMs become integral to various applications, understanding their security vulnerabilities is crucial. This research highlights the risks associated with prompt-based attacks and evaluates potential defenses, informing developers and organizations about necessary precautions.

Key Takeaways

  • LLMs are susceptible to prompt injection and jailbreak attacks, necessitating thorough security assessments.
  • Behavioral variations among different LLMs can lead to inconsistent responses to attacks.
  • Lightweight defense mechanisms can mitigate basic attacks but may fail against complex, reasoning-heavy prompts.

Computer Science > Cryptography and Security arXiv:2602.22242 (cs) [Submitted on 24 Feb 2026] Title:Analysis of LLMs Against Prompt Injection and Jailbreak Attacks Authors:Piyush Jaiswal, Aaditya Pratap, Shreyansh Saraswati, Harsh Kasyap, Somanath Tripathy View a PDF of the paper titled Analysis of LLMs Against Prompt Injection and Jailbreak Attacks, by Piyush Jaiswal and 4 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) are widely deployed in real-world systems. Given their broader applicability, prompt engineering has become an efficient tool for resource-scarce organizations to adopt LLMs for their own purposes. At the same time, LLMs are vulnerable to prompt-based attacks. Thus, analyzing this risk has become a critical security requirement. This work evaluates prompt-injection and jailbreak vulnerability using a large, manually curated dataset across multiple open-source LLMs, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma variants. We observe significant behavioural variation across models, including refusal responses and complete silent non-responsiveness triggered by internal safety mechanisms. Furthermore, we evaluated several lightweight, inference-time defence mechanisms that operate as filters without any retraining or GPU-intensive fine-tuning. Although these defences mitigate straightforward attacks, they are consistently bypassed by long, reasoning-heavy prompts. Comments: Subjects: Cryptography and Security ...

Related Articles

Llms

My AI spent last night modifying its own codebase

I've been working on a local AI system called Apis that runs completely offline through Ollama. During a background run, Apis identified ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Fake users generated by AI can't simulate humans — review of 182 research papers. Your thoughts?

https://www.researchsquare.com/article/rs-9057643/v1 There’s a massive trend right now where tech companies, businesses, even researchers...

Reddit - Artificial Intelligence · 1 min ·
Llms

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

TL;DR: Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — a...

Reddit - Artificial Intelligence · 1 min ·
[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage
Llms

[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

Abstract page for arXiv paper 2603.23966: Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime