A production-ready adaptive router for LLM portfolios that manages cost-quality trade-offs in real-time under strict dollar budgets.
April 2, 2026
Original Paper
ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving
arXiv · 2604.00136
The Takeaway
This is the first open-source router to handle non-stationary conditions like pricing shifts or silent model regressions while enforcing a hard cost ceiling. It makes multi-model deployment (e.g., GPT-4o mixed with Haiku) viable for budget-constrained production apps.
From the abstract
Production LLM serving often relies on multi-model portfolios spanning a ~530x cost range, where routing decisions trade off quality against cost. This trade-off is non-stationary: providers revise pricing, model quality can regress silently, and new models must be integrated without downtime. We present ParetoBandit, an open-source adaptive router built on cost-aware contextual bandits that is the first to simultaneously enforce dollar-denominated budgets, adapt online to such shifts, and onboa