[2603.12564] AgentDrift: Unsafe Recommendation Drift Under Tool

[2603.12564] AgentDrift: Unsafe Recommendation Drift Under Tool Corruption Hidden by Ranking Metrics in LLM Agents

arXiv - AI March 31, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.12564: AgentDrift: Unsafe Recommendation Drift Under Tool Corruption Hidden by Ranking Metrics in LLM Agents

Computer Science > Computation and Language arXiv:2603.12564 (cs) [Submitted on 13 Mar 2026 (v1), last revised 30 Mar 2026 (this version, v4)] Title:AgentDrift: Unsafe Recommendation Drift Under Tool Corruption Hidden by Ranking Metrics in LLM Agents Authors:Zekun Wu, Adriano Koshiyama, Sahan Bulathwela, Maria Perez-Ortiz View a PDF of the paper titled AgentDrift: Unsafe Recommendation Drift Under Tool Corruption Hidden by Ranking Metrics in LLM Agents, by Zekun Wu and 3 other authors View PDF HTML (experimental) Abstract:Tool-augmented LLM agents increasingly operate as multi-turn advisors in high-stakes domains, yet their evaluation relies on ranking metrics that measure what is recommended but not whether it is safe for the user. We present a paired-trajectory protocol that replays real financial dialogues under clean and contaminated tool-output conditions across eight LLMs (7B to frontier), decomposing divergence into information-channel and memory-channel mechanisms. We observe evaluation blindness: recommendation quality is preserved under contamination (UPR~1.0) while risk-inappropriate products appear in 65-93% of turns, invisible to standard NDCG. Violations are information-channel-driven, emerge at turn 1, and persist without self-correction over 23-step trajectories. Even non-extreme perturbations (within-band corruption, narrative-only attacks) evade threshold monitors while producing significant drift. Susceptibility scales with instruction-following fidelity...

Originally published on March 31, 2026. Curated by AI News.

Llms

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

TL;DR: Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — a...

Reddit - Artificial Intelligence · 1 min · 41 minutes ago

Llms

[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

Abstract page for arXiv paper 2603.23966: Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

arXiv - AI · 4 min · about 2 hours ago

Llms

[2603.16790] InCoder-32B: Code Foundation Model for Industrial Scenarios

Abstract page for arXiv paper 2603.16790: InCoder-32B: Code Foundation Model for Industrial Scenarios

arXiv - AI · 4 min · about 2 hours ago

Llms

[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence

Abstract page for arXiv paper 2603.16430: EngGPT2: Sovereign, Efficient and Open Intelligence

arXiv - AI · 4 min · about 2 hours ago

[2603.12564] AgentDrift: Unsafe Recommendation Drift Under Tool Corruption Hidden by Ranking Metrics in LLM Agents

About this article

Related Articles

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

[2603.16790] InCoder-32B: Code Foundation Model for Industrial Scenarios

[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence

No comments

Stay updated with AI News