[2605.04785] AgentTrust: Runtime Safety Evaluation and Interception

[2605.04785] AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

arXiv - AI May 07, 2026 3 min read

About this article

Abstract page for arXiv paper 2605.04785: AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

Computer Science > Artificial Intelligence arXiv:2605.04785 (cs) [Submitted on 6 May 2026] Title:AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use Authors:Chenglin Yang View a PDF of the paper titled AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use, by Chenglin Yang View PDF HTML (experimental) Abstract:Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A single unsafe action, including accidental deletion, credential exposure, or data exfiltration, can cause irreversible harm. Existing defenses are incomplete: post-hoc benchmarks measure behavior after execution, static guardrails miss obfuscation and multi-step context, and infrastructure sandboxes constrain where code runs without understanding what an action means. We present AgentTrust, a runtime safety layer that intercepts agent tool calls before execution and returns a structured verdict: allow, warn, block, or review. AgentTrust combines a shell deobfuscation normalizer, SafeFix suggestions for safer alternatives, RiskChain detection for multi-step attack chains, and a cache-aware LLM-as-Judge for ambiguous inputs. We release a 300-scenario benchmark across six risk categories and an additional 630 independently constructed real-world adversarial scenarios. On the internal benchmark, the production-only ruleset achieves 95.0% verdict accuracy and 73.7% risk-level acc...

Originally published on May 07, 2026. Curated by AI News.

Ai Safety

Implementing advanced AI technologies in finance | MIT Technology Review

In finance departments that have long been defined by precision and control, AI has arrived less as a neatly managed upgrade than as a qu...

MIT Technology Review - AI · 4 min · about 2 hours ago

Llms

[2602.07026] Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

Abstract page for arXiv paper 2602.07026: Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

arXiv - AI · 4 min · about 9 hours ago

Machine Learning

[2511.22893] Switching-time bioprocess control with pulse-width-modulated optogenetics

Abstract page for arXiv paper 2511.22893: Switching-time bioprocess control with pulse-width-modulated optogenetics

arXiv - AI · 4 min · about 9 hours ago

Llms

[2407.04183] Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

Abstract page for arXiv paper 2407.04183: Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

arXiv - AI · 4 min · about 9 hours ago

[2605.04785] AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

About this article

Related Articles

Implementing advanced AI technologies in finance | MIT Technology Review

[2602.07026] Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

[2511.22893] Switching-time bioprocess control with pulse-width-modulated optogenetics

[2407.04183] Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

No comments

Stay updated with AI News