AI & ML Efficiency Breakthrough

Enables precise prompt routing by predicting the expected reward of a model before any response is generated.

March 24, 2026

Original Paper

Expected Reward Prediction, with Applications to Model Routing

Kenan Hasanaliyev, Silas Alberti, Jenny Hamer, Dheeraj Rajagopal, Kevin Robinson, Jasper Snoek, Victor Veitch, Alexander Nicholas D'Amour

arXiv · 2603.20217

The Takeaway

Practitioners can now systematically route queries to the cheapest model likely to succeed based on predicted reward, rather than relying on category-level heuristics or expensive multi-model sampling.

From the abstract

Reward models are a standard tool to score responses from LLMs. Reward models are built to rank responses to a fixed prompt sampled from a single model, for example to choose the best of n sampled responses. In this paper, we study whether scores from response-level reward models lifted to score a model's suitability for a prompt, prior to seeing responses from that model. Specifically, we show that it is straightforward to predict the expected reward that an LLM would earn from the reward model