Proves that Transformers can internalize complex search algorithms like MCTS directly into their weights.
March 27, 2026
Original Paper
Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback
arXiv · 2603.24780
The Takeaway
This suggests a future where LLMs don't need external search 'scaffolding' or bandit feedback loops, as the architecture itself can learn to approximate optimal search strategies over unknown spaces.
From the abstract
Effective problem solving with Large Language Models (LLMs) can be enhanced when they are paired with external search algorithms. By viewing the space of diverse ideas and their follow-up possibilities as a tree structure, the search algorithm can navigate such a search space and guide the LLM toward better solutions more efficiently. While the search algorithm enables an effective balance between exploitation and exploration of a tree-structured space, the need for an external component can com