Restores monotonic scaling in LLM tree search by replacing standard MCTS selection with Gumbel sampling and Sequential Halving.
March 24, 2026
Original Paper
Revisiting Tree Search for LLMs: Gumbel and Sequential Halving for Budget-Scalable Reasoning
arXiv · 2603.21162
The Takeaway
Previously, increasing the search budget for LLM reasoning often led to performance drops. This 'ReSCALE' approach ensures that accuracy actually improves as more compute is allocated at inference, a vital finding for the future of 'o1-style' reasoning models.
From the abstract
Neural tree search is a powerful decision-making algorithm widely used in complex domains such as game playing and model-based reinforcement learning. Recent work has applied AlphaZero-style tree search to enhance the reasoning capabilities of Large Language Models (LLMs) during inference, but we find that this approach suffers from a scaling failure: on GSM8K and Game24, accuracy drops as the search budget increases. In this paper, we present ReSCALE, an adaptation of Gumbel AlphaZero MCTS that