[2603.19415] Scalable Prompt Routing via Fine-Grained Latent Task Discovery
About this article
Abstract page for arXiv paper 2603.19415: Scalable Prompt Routing via Fine-Grained Latent Task Discovery
Computer Science > Computation and Language arXiv:2603.19415 (cs) [Submitted on 19 Mar 2026] Title:Scalable Prompt Routing via Fine-Grained Latent Task Discovery Authors:Yunyi Zhang, Soji Adeshina, Patrick Guan, Ashwin Ganesh, Zhen Han, Vassilis N. Ioannidis, Huzefa Rangwala, George Karypis View a PDF of the paper titled Scalable Prompt Routing via Fine-Grained Latent Task Discovery, by Yunyi Zhang and 7 other authors View PDF HTML (experimental) Abstract:Prompt routing dynamically selects the most appropriate large language model from a pool of candidates for each query, optimizing performance while managing costs. As model pools scale to include dozens of frontier models with narrow performance gaps, existing approaches face significant challenges: manually defined task taxonomies cannot capture fine-grained capability distinctions, while monolithic routers struggle to differentiate subtle differences across diverse tasks. We propose a two-stage routing architecture that addresses these limitations through automated fine-grained task discovery and task-aware quality estimation. Our first stage employs graph-based clustering to discover latent task types and trains a classifier to assign prompts to discovered tasks. The second stage uses a mixture-of-experts architecture with task-specific prediction heads for specialized quality estimates. At inference, we aggregate predictions from both stages to balance task-level stability with prompt-specific adaptability. Evaluated ...