[2604.09285] SAGE: A Service Agent Graph-guided Evaluation Benchmark
About this article
Abstract page for arXiv paper 2604.09285: SAGE: A Service Agent Graph-guided Evaluation Benchmark
Computer Science > Artificial Intelligence arXiv:2604.09285 (cs) [Submitted on 10 Apr 2026] Title:SAGE: A Service Agent Graph-guided Evaluation Benchmark Authors:Ling Shi, Yuqin Dai, Ziyin Wang, Ning Gao, Wei Zhang, Chaozheng Wang, Yujie Wang, Wei He, Jinpeng Wang, Deiyi Xiong View a PDF of the paper titled SAGE: A Service Agent Graph-guided Evaluation Benchmark, by Ling Shi and 9 other authors View PDF HTML (experimental) Abstract:The development of Large Language Models (LLMs) has catalyzed automation in customer service, yet benchmarking their performance remains challenging. Existing benchmarks predominantly rely on static paradigms and single-dimensional metrics, failing to account for diverse user behaviors or the strict adherence to structured Standard Operating Procedures (SOPs) required in real-world deployments. To bridge this gap, we propose SAGE (Service Agent Graph-guided Evaluation), a universal multi-agent benchmark for automated, dual-axis assessment. SAGE formalizes unstructured SOPs into Dynamic Dialogue Graphs, enabling precise verification of logical compliance and comprehensive path coverage. We introduce an Adversarial Intent Taxonomy and a modular Extension Mechanism, enabling low-cost deployment across domains and facilitating automated dialogue data synthesis. Evaluation is conducted via a framework where Judge Agents and a Rule Engine analyze interactions between User and Service Agents to generate deterministic ground truth. Extensive experiments...