[2602.09937] Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?
About this article
Abstract page for arXiv paper 2602.09937: Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?
Computer Science > Artificial Intelligence arXiv:2602.09937 (cs) [Submitted on 10 Feb 2026 (v1), last revised 4 Mar 2026 (this version, v2)] Title:Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis? Authors:Taeyoon Kim, Woohyeok Park, Hoyeong Yun, Kyungyong Lee View a PDF of the paper titled Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?, by Taeyoon Kim and 3 other authors View PDF Abstract:Failures in large-scale cloud systems incur substantial financial losses, making automated Root Cause Analysis (RCA) essential for operational stability. Recent efforts leverage Large Language Model (LLM) agents to automate this task, yet existing systems exhibit low detection accuracy even with capable models, and current evaluation frameworks assess only final answer correctness without revealing why the agent's reasoning failed. This paper presents a process level failure analysis of LLM-based RCA agents. We execute the full OpenRCA benchmark across five LLM models, producing 1,675 agent runs, and classify observed failures into 12 pitfall types across intra-agent reasoning, inter-agent communication, and agent-environment interaction. Our analysis reveals that the most prevalent pitfalls, notably hallucinated data interpretation and incomplete exploration, persist across all models regardless of capability tier, indicating that these failures originate from the shared agent architecture rather than from individual model limitations. Controlled miti...