[2603.25342] From Intent to Evidence: A Categorical Approach for Structural Evaluation of Deep Research Agents
About this article
Abstract page for arXiv paper 2603.25342: From Intent to Evidence: A Categorical Approach for Structural Evaluation of Deep Research Agents
Computer Science > Machine Learning arXiv:2603.25342 (cs) [Submitted on 26 Mar 2026] Title:From Intent to Evidence: A Categorical Approach for Structural Evaluation of Deep Research Agents Authors:Shuoling Liu, Zhiquan Tan, Kun Yi, Hui Wu, Yihan Li, Jiangpeng Yan, Liyuan Chen, Kai Chen, Qiang Yang View a PDF of the paper titled From Intent to Evidence: A Categorical Approach for Structural Evaluation of Deep Research Agents, by Shuoling Liu and 8 other authors View PDF HTML (experimental) Abstract:Although deep research agents (DRAs) have emerged as a promising paradigm for complex information synthesis, their evaluation remains constrained by ad hoc empirical benchmarks. These heuristic approaches do not rigorously model agent behavior or adequately stress-test long-horizon synthesis and ambiguity resolution. To bridge this gap, we formalize DRA behavior through the lens of category theory, modeling deep research workflow as a composition of structure-preserving maps (functors). Grounded in this theoretical framework, we introduce a novel mechanism-aware benchmark with 296 questions designed to stress-test agents along four interpretable axes: traversing sequential connectivity chains, verifying intersections within V-structure pullbacks, imposing topological ordering on retrieved substructures, and performing ontological falsification via the Yoneda Probe. Our rigorous evaluation of 11 leading models establishes a persistently low baseline, with the state-of-the-art achi...