AdaAnchor enables LLMs to perform multi-step reasoning entirely in latent space with an adaptive halting mechanism to optimize compute.
March 17, 2026
Original Paper
Thinking in Latents: Adaptive Anchor Refinement for Implicit Reasoning in LLMs
arXiv · 2603.15051
The Takeaway
It addresses the massive overhead of chain-of-thought token generation by shifting reasoning into hidden representations. The adaptive halting mechanism reduces latent refinement steps by up to 60% compared to fixed-step methods, making 'internal thinking' practical for real-time inference.
From the abstract
Token-level Chain-of-Thought (CoT) prompting has become a standard way to elicit multi-step reasoning in large language models (LLMs), especially for mathematical word problems. However, generating long intermediate traces increases output length and inference cost, and can be inefficient when the model could arrive at the correct answer without extensive verbalization. This has motivated latent-space reasoning approaches that shift computation into hidden representations and only emit a final a