[2604.06436] The Defense Trilemma: Why Prompt Injection Defense

[2604.06436] The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

arXiv - AI April 09, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.06436: The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

Computer Science > Cryptography and Security arXiv:2604.06436 (cs) [Submitted on 7 Apr 2026] Title:The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail? Authors:Manish Bhatt, Sarthak Munshi, Vineeth Sai Narajala, Idan Habler, Ammar Al-Kahfah, Ken Huang, Blake Gatto View a PDF of the paper titled The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?, by Manish Bhatt and 6 other authors View PDF HTML (experimental) Abstract:We prove that no continuous, utility-preserving wrapper defense-a function $D: X\to X$ that preprocesses inputs before the model sees them-can make all outputs strictly safe for a language model with connected prompt space, and we characterize exactly where every such defense must fail. We establish three results under successively stronger hypotheses: boundary fixation-the defense must leave some threshold-level inputs unchanged; an $\epsilon$-robust constraint-under Lipschitz regularity, a positive-measure band around fixed boundary points remains near-threshold; and a persistent unsafe region under a transversality condition, a positive-measure subset of inputs remains strictly unsafe. These constitute a defense trilemma: continuity, utility preservation, and completeness cannot coexist. We prove parallel discrete results requiring no topology, and extend to multi-turn interactions, stochastic defenses, and capacity-parity settings. The results do not preclude training-time alignment, architectural changes, or defenses that sacr...

Originally published on April 09, 2026. Curated by AI News.

Llms

OpenAI introduces new 'Trusted Contact' safeguard for cases of possible self-harm | TechCrunch

The company is expanding its efforts to protect ChatGPT users in cases where conversations may turn to self-harm.

TechCrunch - AI · 5 min · 39 minutes ago

Llms

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

Thanks to Musk v. Altman, the public is getting a concrete look at details of Sam Altman’s ouster from OpenAI, much of it centered on for...

The Verge - AI · 11 min · about 2 hours ago

Llms

Diffusion for generating/editing ASTs? [D]

I’m not a machine learning expert or anything, but I do enjoy learning about how it all works. I’ve noticed that one of the main limitati...

Reddit - Machine Learning · 1 min · about 3 hours ago

Llms

ChatGPT’s ‘Trusted Contact’ will alert loved ones of safety concerns | The Verge

OpenAI is launching an optional safety feature for ChatGPT that allows adult users to assign an emergency contact for mental health and s...

The Verge - AI · 4 min · about 3 hours ago

[2604.06436] The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

About this article

Related Articles

OpenAI introduces new 'Trusted Contact' safeguard for cases of possible self-harm | TechCrunch

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

Diffusion for generating/editing ASTs? [D]

ChatGPT’s ‘Trusted Contact’ will alert loved ones of safety concerns | The Verge

No comments

Stay updated with AI News