[2602.13379] Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents

[2602.13379] Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents

arXiv - Machine Learning 4 min read Article

Summary

This article presents a new benchmark, MT-AgentRisk, for evaluating safety risks in multi-turn interactions of tool-using agents, revealing significant safety degradation and proposing a novel defense mechanism, ToolShield.

Why It Matters

As AI agents become more capable, ensuring their safety during complex interactions is crucial. This research highlights the growing risks associated with multi-turn engagements and provides a framework for assessing and mitigating these risks, which is essential for the future of AI safety.

Key Takeaways

  • Multi-turn interactions in AI agents introduce significant safety risks.
  • The MT-AgentRisk benchmark reveals a 16% increase in attack success rates in multi-turn settings.
  • ToolShield is a proposed defense mechanism that reduces attack success rates by 30% without requiring prior training.
  • The study emphasizes the need for systematic safety evaluations in AI tools.
  • This research contributes to the ongoing discourse on AI safety and responsible AI deployment.

Computer Science > Cryptography and Security arXiv:2602.13379 (cs) [Submitted on 13 Feb 2026] Title:Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents Authors:Xu Li, Simon Yu, Minzhou Pan, Yiyou Sun, Bo Li, Dawn Song, Xue Lin, Weiyan Shi View a PDF of the paper titled Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents, by Xu Li and Simon Yu and Minzhou Pan and Yiyou Sun and Bo Li and Dawn Song and Xue Lin and Weiyan Shi View PDF HTML (experimental) Abstract:LLM-based agents are becoming increasingly capable, yet their safety lags behind. This creates a gap between what agents can do and should do. This gap widens as agents engage in multi-turn interactions and employ diverse tools, introducing new risks overlooked by existing benchmarks. To systematically scale safety testing into multi-turn, tool-realistic settings, we propose a principled taxonomy that transforms single-turn harmful tasks into multi-turn attack sequences. Using this taxonomy, we construct MT-AgentRisk (Multi-Turn Agent Risk Benchmark), the first benchmark to evaluate multi-turn tool-using agent safety. Our experiments reveal substantial safety degradation: the Attack Success Rate (ASR) increases by 16% on average across open and closed models in multi-turn settings. To close this gap, we propose ToolShield, a training-free, tool-agnostic, self-exploration defense: when encountering a new tool, the agent autonomously g...

Related Articles

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization
Llms

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Abstract page for arXiv paper 2603.16105: Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

arXiv - AI · 4 min ·
[2603.09643] MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings
Llms

[2603.09643] MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

Abstract page for arXiv paper 2603.09643: MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Contro...

arXiv - AI · 4 min ·
[2603.07339] Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice
Llms

[2603.07339] Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice

Abstract page for arXiv paper 2603.07339: Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice

arXiv - AI · 4 min ·
[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities
Llms

[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

Abstract page for arXiv paper 2602.00185: QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime