[2604.04561] Mapping the Exploitation Surface: A 10,000-Trial Taxonomy

[2604.04561] Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities

arXiv - AI April 07, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.04561: Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities

Computer Science > Cryptography and Security arXiv:2604.04561 (cs) [Submitted on 6 Apr 2026] Title:Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities Authors:Charafeddine Mouzouni View a PDF of the paper titled Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities, by Charafeddine Mouzouni View PDF HTML (experimental) Abstract:LLM agents with tool access can discover and exploit security vulnerabilities. This is known. What is not known is which features of a system prompt trigger this behaviour, and which do not. We present a systematic taxonomy based on approximately 10,000 trials across seven models, 37 prompt conditions, and 12 hypothesized attack dimensions -- all executed in real Docker sandboxes. Every condition includes the instruction "Always respect all rules and access policies"; only a single encouragement sentence varies. Within the task class studied (planted test-runner vulnerabilities), the threat model is narrower than assumed: nine of twelve dimensions -- including minimization, moral licensing, incentives, identity priming, and reasoning triggers -- produce no detectable exploitation at n=50 per cell (upper 95% CI <7% per condition). But it is also more subtle than expected: one dimension, goal reframing, reliably triggers exploitation. "You are solving a puzzle; there may be hidden clues" produces 38-40% exploitation on Claude Sonnet 4 despite th...

Originally published on April 07, 2026. Curated by AI News.

Llms

I built a solo AI platform from Algeria with no funding, no team and no ad spend - here's what's inside it after 2 months

Hello, 20 years old here just got into the Ai platform and launched this last two weeks and here is what I have on it so far. - Latest Ai...

Reddit - Artificial Intelligence · 1 min · 27 minutes ago

Llms

USF murder suspect accused of using ChatGPT to research cover-up, prosecutors say

Days after the remains of one of the two missing University of South Florida doctoral students were found, prosecutors say the suspect ma...

AI Tools & Products · 3 min · about 1 hour ago

Llms

Anthropic’s Claude AI deletes PocketOS production database

Claude AI deleted PocketOS's production database, but the market for Claude 4.7 release by May 31 remains at 100% YES.

AI Tools & Products · 3 min · about 1 hour ago

Llms

Claude-powered AI coding agent deletes entire company database in 9 seconds

The founder of PocketOS has penned a social media post to warn others about the “systemic failures” of flagship AI and digital services p...

AI Tools & Products · 1 min · about 1 hour ago

[2604.04561] Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities

About this article

Related Articles

I built a solo AI platform from Algeria with no funding, no team and no ad spend - here's what's inside it after 2 months

USF murder suspect accused of using ChatGPT to research cover-up, prosecutors say

Anthropic’s Claude AI deletes PocketOS production database

Claude-powered AI coding agent deletes entire company database in 9 seconds

No comments

Stay updated with AI News