[2604.00136] ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

[2604.00136] ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2604.00136: ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

Computer Science > Machine Learning arXiv:2604.00136 (cs) [Submitted on 31 Mar 2026] Title:ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving Authors:Annette Taberner-Miller View a PDF of the paper titled ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving, by Annette Taberner-Miller View PDF HTML (experimental) Abstract:Production LLM serving often relies on multi-model portfolios spanning a ~530x cost range, where routing decisions trade off quality against cost. This trade-off is non-stationary: providers revise pricing, model quality can regress silently, and new models must be integrated without downtime. We present ParetoBandit, an open-source adaptive router built on cost-aware contextual bandits that is the first to simultaneously enforce dollar-denominated budgets, adapt online to such shifts, and onboard new models at runtime. ParetoBandit closes these gaps through three mechanisms. An online primal-dual budget pacer enforces a per-request cost ceiling over an open-ended stream, replacing offline penalty tuning with closed-loop control. Geometric forgetting on sufficient statistics enables rapid adaptation to price and quality shifts while bootstrapping from offline priors. A hot-swap registry lets operators add or remove models at runtime, with a brief forced-exploration phase for each newcomer, after which UCB selection discovers its quality-cost niche from live traffic alone. We evaluate ParetoBandit across four d...

Originally published on April 02, 2026. Curated by AI News.

Related Articles

Llms

Claude on Claude

The Story of Anthropic’s Latest Controversies Regarding the Business of Its Prized Creation… As Told by the Thing Itself. Editor’s note: ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Cut Claude usage by ~85% in a job search pipeline (16k → 900 tokens/app) — here’s what worked

Like many here, I kept running into Claude usage limits when building anything non-trivial. I was working with a job search automation pi...

Reddit - Artificial Intelligence · 1 min ·
Llms

"Authoritarian Parents In Rationalist Clothes": a piece I wrote in December about alignment

Posted today in light of the Claude Mythos model card release. Originally I wrote this for r/ControlProblem but realized it was getting o...

Reddit - Artificial Intelligence · 1 min ·
Llms

AI joins the 8-hour work day as GLM ships 5.1 open source LLM, beating Opus 4.6 and GPT-5.4 on SWE-Bench Pro

AI Tools & Products ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime