[2602.22242] Analysis of LLMs Against Prompt Injection and Jailbreak Attacks
Summary
This paper analyzes the vulnerabilities of Large Language Models (LLMs) to prompt injection and jailbreak attacks, evaluating various defense mechanisms across multiple models.
Why It Matters
As LLMs become integral to various applications, understanding their security vulnerabilities is crucial. This research highlights the risks associated with prompt-based attacks and evaluates potential defenses, informing developers and organizations about necessary precautions.
Key Takeaways
- LLMs are susceptible to prompt injection and jailbreak attacks, necessitating thorough security assessments.
- Behavioral variations among different LLMs can lead to inconsistent responses to attacks.
- Lightweight defense mechanisms can mitigate basic attacks but may fail against complex, reasoning-heavy prompts.
Computer Science > Cryptography and Security arXiv:2602.22242 (cs) [Submitted on 24 Feb 2026] Title:Analysis of LLMs Against Prompt Injection and Jailbreak Attacks Authors:Piyush Jaiswal, Aaditya Pratap, Shreyansh Saraswati, Harsh Kasyap, Somanath Tripathy View a PDF of the paper titled Analysis of LLMs Against Prompt Injection and Jailbreak Attacks, by Piyush Jaiswal and 4 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) are widely deployed in real-world systems. Given their broader applicability, prompt engineering has become an efficient tool for resource-scarce organizations to adopt LLMs for their own purposes. At the same time, LLMs are vulnerable to prompt-based attacks. Thus, analyzing this risk has become a critical security requirement. This work evaluates prompt-injection and jailbreak vulnerability using a large, manually curated dataset across multiple open-source LLMs, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma variants. We observe significant behavioural variation across models, including refusal responses and complete silent non-responsiveness triggered by internal safety mechanisms. Furthermore, we evaluated several lightweight, inference-time defence mechanisms that operate as filters without any retraining or GPU-intensive fine-tuning. Although these defences mitigate straightforward attacks, they are consistently bypassed by long, reasoning-heavy prompts. Comments: Subjects: Cryptography and Security ...