[2603.03761] AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation
About this article
Abstract page for arXiv paper 2603.03761: AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation
Computer Science > Artificial Intelligence arXiv:2603.03761 (cs) [Submitted on 4 Mar 2026] Title:AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation Authors:Yunxiao Shi, Wujiang Xu, Tingwei Chen, Haoning Shang, Ling Yang, Yunfeng Wan, Zhuo Cao, Xing Zi, Dimitris N. Metaxas, Min Xu View a PDF of the paper titled AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation, by Yunxiao Shi and 9 other authors View PDF HTML (experimental) Abstract:LLM agents are rapidly becoming the practical interface for task automation, yet the ecosystem lacks a principled way to choose among an exploding space of deployable configurations. Existing LLM leaderboards and tool/agent benchmarks evaluate components in isolation and remain fragmented across tasks, metrics, and candidate pools, leaving a critical research gap: there is little query-conditioned supervision for learning to recommend end-to-end agent configurations that couple a backbone model with a toolkit. We address this gap with AgentSelect, a benchmark that reframes agent selection as narrative query-to-agent recommendation over capability profiles and systematically converts heterogeneous evaluation artifacts into unified, positive-only interaction data. AgentSelectcomprises 111,179 queries, 107,721 deployable agents, and 251,103 interaction records aggregated from 40+ sources, spanning LLM-only, toolkit-only, and compositional agents. Our analyses reveal a regime shift from dense head reuse to long-tail,...