[2605.04785] AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

[2605.04785] AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2605.04785: AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

Computer Science > Artificial Intelligence arXiv:2605.04785 (cs) [Submitted on 6 May 2026] Title:AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use Authors:Chenglin Yang View a PDF of the paper titled AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use, by Chenglin Yang View PDF HTML (experimental) Abstract:Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A single unsafe action, including accidental deletion, credential exposure, or data exfiltration, can cause irreversible harm. Existing defenses are incomplete: post-hoc benchmarks measure behavior after execution, static guardrails miss obfuscation and multi-step context, and infrastructure sandboxes constrain where code runs without understanding what an action means. We present AgentTrust, a runtime safety layer that intercepts agent tool calls before execution and returns a structured verdict: allow, warn, block, or review. AgentTrust combines a shell deobfuscation normalizer, SafeFix suggestions for safer alternatives, RiskChain detection for multi-step attack chains, and a cache-aware LLM-as-Judge for ambiguous inputs. We release a 300-scenario benchmark across six risk categories and an additional 630 independently constructed real-world adversarial scenarios. On the internal benchmark, the production-only ruleset achieves 95.0% verdict accuracy and 73.7% risk-level acc...

Originally published on May 07, 2026. Curated by AI News.

Related Articles

Implementing advanced AI technologies in finance | MIT Technology Review
Ai Safety

Implementing advanced AI technologies in finance | MIT Technology Review

In finance departments that have long been defined by precision and control, AI has arrived less as a neatly managed upgrade than as a qu...

MIT Technology Review - AI · 4 min ·
[2602.07026] Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
Llms

[2602.07026] Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

Abstract page for arXiv paper 2602.07026: Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

arXiv - AI · 4 min ·
[2511.22893] Switching-time bioprocess control with pulse-width-modulated optogenetics
Machine Learning

[2511.22893] Switching-time bioprocess control with pulse-width-modulated optogenetics

Abstract page for arXiv paper 2511.22893: Switching-time bioprocess control with pulse-width-modulated optogenetics

arXiv - AI · 4 min ·
[2407.04183] Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms
Llms

[2407.04183] Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

Abstract page for arXiv paper 2407.04183: Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

arXiv - AI · 4 min ·
More in Ai Safety: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime