[2603.24631] TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis
About this article
Abstract page for arXiv paper 2603.24631: TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis
Computer Science > Software Engineering arXiv:2603.24631 (cs) [Submitted on 25 Mar 2026] Title:TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis Authors:Myeongsoo Kim, Dingmin Wang, Siwei Cui, Farima Farmahinifarahani, Shweta Garg, Baishakhi Ray, Terry Yue Zhuo, Rajdeep Mukherjee, Varun Kumar View a PDF of the paper titled TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis, by Myeongsoo Kim and 8 other authors View PDF Abstract:Code agents can autonomously resolve GitHub issues, yet when they fail, current evaluation provides no visibility into where or why. Metrics such as Pass@1 collapse an entire execution into a single binary outcome, making it difficult to identify where and why the agent went wrong. To address this limitation, we introduce TRAJEVAL, a diagnostic framework that decomposes agent trajectories into three interpretable stages: search (file localization), read (function comprehension), and edit (modification targeting). For each stage, we compute precision and recall by comparing against reference patches. Analyzing 16,758 trajectories across three agent architectures and seven models, we find universal inefficiencies (all agents examine approximately 22x more functions than necessary) yet distinct failure modes: GPT-5 locates relevant code but targets edits incorrectly, while Qwen-32B fails at file discovery entirely. We validate that these diagnostics are predictive, achieving model-level Pass@1 prediction wit...