Moves beyond next-token prediction to model reasoning as gradient-based energy minimization over latent trajectories.
March 31, 2026
Original Paper
Reasoning as Energy Minimization over Structured Latent Trajectories
arXiv · 2603.28248
The Takeaway
Instead of discrete 'Chain of Thought' tokens, this method uses a continuous latent space where reasoning is an optimization process. It provides a scalar measure of reasoning progress and allows for iterative refinement of thoughts before decoding, addressing a major limitation of current LLM decoders.
From the abstract
Single-shot neural decoders commit to answers without iterative refinement, while chain-of-thought methods introduce discrete intermediate steps but lack a scalar measure of reasoning progress. We propose Energy-Based Reasoning via Structured Latent Planning (EBRM), which models reasoning as gradient-based optimization of a multi-step latent trajectory $z_{1:T}$ under a learned energy function $E(h_x, z)$. The energy decomposes into per-step compatibility, transition consistency, and trajectory