[2510.08847] What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment
About this article
Abstract page for arXiv paper 2510.08847: What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment
Computer Science > Artificial Intelligence arXiv:2510.08847 (cs) [Submitted on 9 Oct 2025 (v1), last revised 27 Mar 2026 (this version, v2)] Title:What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment Authors:Allison Sihan Jia, Daniel Huang, Nikhil Vytla, Seung Won Wilson Yoo, Nirvika Choudhury, Shayak Sen, John C. Mitchell, Anupam Datta View a PDF of the paper titled What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment, by Allison Sihan Jia and 7 other authors View PDF HTML (experimental) Abstract:We introduce the Agent GPA (Goal-Plan-Action) framework, driven by the fundamental insight that critical agent failures emerge at the intersections of setting goals, devising plans, and executing actions. We operationalize the framework with a factorized suite of LLM judges designed to measure distinct elements of Goal-Plan-Act alignment. To make this methodology scalable and generalizable across diverse agent architectures and datasets, we use state-of-the-art automated prompt optimization techniques to systematically generate domain-specific evaluation criteria. We validate this approach across three benchmarks: a multi-agent research setting (TRAIL/GAIA), a single coding agent setting (TRAIL/SWE-bench), and a private, enterprise data-agent setting (Snowflake Intelligence). Extensive evaluation on TRAIL/GAIA demonstrates the core validity of the framework, which identifies a broad range of agent failures (95% of ...