Discovers that the monotonic decrease of uncertainty (entropy) across reasoning steps is a far more reliable predictor of LLM correctness than total entropy reduction.
March 20, 2026
Original Paper
Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought
arXiv · 2603.18940
The Takeaway
The paper challenges the reliance on scalar confidence or aggregate entropy measures for CoT reliability. It shows that the 'shape' of the uncertainty trajectory—specifically whether entropy decreases at every step—is a superior and computationally cheap diagnostic for identifying reasoning failures.
From the abstract
Chain-of-thought (CoT) reasoning improves LLM accuracy, yet detecting failures cheaply remains elusive. We study whether the shape of uncertainty dynamics across reasoning steps--captured by sampling a few answer completions per step--predicts correctness.We introduce entropy-trajectory monotonicity: a chain is monotone if its per-step answer-distribution entropy decreases at every step. On GSM8K (n=300) with Qwen2.5-7B-Instruct, monotone chains achieve 68.8% accuracy vs. 46.8% for non-monotone