[2604.00136] ParetoBandit: Budget-Paced Adaptive Routing for

[2604.00136] ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

arXiv - Machine Learning April 02, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.00136: ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

Computer Science > Machine Learning arXiv:2604.00136 (cs) [Submitted on 31 Mar 2026] Title:ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving Authors:Annette Taberner-Miller View a PDF of the paper titled ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving, by Annette Taberner-Miller View PDF HTML (experimental) Abstract:Production LLM serving often relies on multi-model portfolios spanning a ~530x cost range, where routing decisions trade off quality against cost. This trade-off is non-stationary: providers revise pricing, model quality can regress silently, and new models must be integrated without downtime. We present ParetoBandit, an open-source adaptive router built on cost-aware contextual bandits that is the first to simultaneously enforce dollar-denominated budgets, adapt online to such shifts, and onboard new models at runtime. ParetoBandit closes these gaps through three mechanisms. An online primal-dual budget pacer enforces a per-request cost ceiling over an open-ended stream, replacing offline penalty tuning with closed-loop control. Geometric forgetting on sufficient statistics enables rapid adaptation to price and quality shifts while bootstrapping from offline priors. A hot-swap registry lets operators add or remove models at runtime, with a brief forced-exploration phase for each newcomer, after which UCB selection discovers its quality-cost niche from live traffic alone. We evaluate ParetoBandit across four d...

Originally published on April 02, 2026. Curated by AI News.

Llms

Claude on Claude

The Story of Anthropic’s Latest Controversies Regarding the Business of Its Prized Creation… As Told by the Thing Itself. Editor’s note: ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Cut Claude usage by ~85% in a job search pipeline (16k → 900 tokens/app) — here’s what worked

Like many here, I kept running into Claude usage limits when building anything non-trivial. I was working with a job search automation pi...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

"Authoritarian Parents In Rationalist Clothes": a piece I wrote in December about alignment

Posted today in light of the Claude Mythos model card release. Originally I wrote this for r/ControlProblem but realized it was getting o...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

AI joins the 8-hour work day as GLM ships 5.1 open source LLM, beating Opus 4.6 and GPT-5.4 on SWE-Bench Pro

AI Tools & Products · about 1 hour ago

[2604.00136] ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

About this article

Related Articles

Claude on Claude

Cut Claude usage by ~85% in a job search pipeline (16k → 900 tokens/app) — here’s what worked

"Authoritarian Parents In Rationalist Clothes": a piece I wrote in December about alignment

AI joins the 8-hour work day as GLM ships 5.1 open source LLM, beating Opus 4.6 and GPT-5.4 on SWE-Bench Pro

No comments

Stay updated with AI News