[2603.19423] The Autonomy Tax: Defense Training Breaks LLM Agents
About this article
Abstract page for arXiv paper 2603.19423: The Autonomy Tax: Defense Training Breaks LLM Agents
Computer Science > Cryptography and Security arXiv:2603.19423 (cs) [Submitted on 19 Mar 2026] Title:The Autonomy Tax: Defense Training Breaks LLM Agents Authors:Shawn Li, Yue Zhao View a PDF of the paper titled The Autonomy Tax: Defense Training Breaks LLM Agents, by Shawn Li and 1 other authors View PDF HTML (experimental) Abstract:Large language model (LLM) agents increasingly rely on external tools (file operations, API calls, database transactions) to autonomously complete complex multi-step tasks. Practitioners deploy defense-trained models to protect against prompt injection attacks that manipulate agent behavior through malicious observations or retrieved content. We reveal a fundamental \textbf{capability-alignment paradox}: defense training designed to improve safety systematically destroys agent competence while failing to prevent sophisticated attacks. Evaluating defended models against undefended baselines across 97 agent tasks and 1,000 adversarial prompts, we uncover three systematic biases unique to multi-step agents. \textbf{Agent incompetence bias} manifests as immediate tool execution breakdown, with models refusing or generating invalid actions on benign tasks before observing any external content. \textbf{Cascade amplification bias} causes early failures to propagate through retry loops, pushing defended models to timeout on 99\% of tasks compared to 13\% for baselines. \textbf{Trigger bias} leads to paradoxical security degradation where defended model...