[2602.21227] Budget-Aware Agentic Routing via Boundary-Guided Training
Summary
The paper presents Budget-Aware Agentic Routing, a method for optimizing the use of large language models in autonomous agents by balancing cost and performance through dynamic model selection.
Why It Matters
As large language models become integral to autonomous agents, managing operational costs while maintaining efficiency is crucial. This research proposes a novel approach to model routing that addresses economic sustainability in AI applications, making it relevant for developers and researchers in AI and machine learning.
Key Takeaways
- Introduces Budget-Aware Agentic Routing to optimize model selection based on cost and performance.
- Employs Boundary-Guided Training to enhance learning under sparse rewards.
- Demonstrates improved efficiency in routing while adhering to strict budget constraints.
- Shifts focus from static model selection to dynamic decision-making in AI workflows.
- Establishes a foundational framework for future research in agentic routing.
Computer Science > Computation and Language arXiv:2602.21227 (cs) [Submitted on 4 Feb 2026] Title:Budget-Aware Agentic Routing via Boundary-Guided Training Authors:Caiqi Zhang, Menglin Xia, Xuchao Zhang, Daniel Madrigal, Ankur Mallick, Samuel Kessler, Victor Ruehle, Saravan Rajmohan View a PDF of the paper titled Budget-Aware Agentic Routing via Boundary-Guided Training, by Caiqi Zhang and 7 other authors View PDF HTML (experimental) Abstract:As large language models (LLMs) evolve into autonomous agents that execute long-horizon workflows, invoking a high-capability model at every step becomes economically unsustainable. While model routing is effective for single-turn queries, agentic routing is a sequential, path-dependent problem: early mistakes compound, feedback is often at the end of the episode, and deployments often demand strict per-task spending limits. We propose Budget-Aware Agentic Routing, which selects between a cheap and an expensive model at each step to optimize the cost--success frontier and to operate under strict per-task budgets. We propose Boundary-Guided Training, which leverages two boundary policies (always-small vs.\ always-large) to build a difficulty taxonomy and to anchor learning under sparse rewards. Our approach warms start with boundary-guided SFT data synthesis via stratified sampling of cost-efficient trajectories, then applies Boundary-Guided Policy Optimization (BoPO), combining boundary-relative rewards with a reference-guided advanta...