AI & ML Breaks Assumption

Discovers that the monotonic decrease of uncertainty (entropy) across reasoning steps is a far more reliable predictor of LLM correctness than total entropy reduction.

March 20, 2026

Original Paper

Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought

Xinghao Zhao

arXiv · 2603.18940

The Takeaway

The paper challenges the reliance on scalar confidence or aggregate entropy measures for CoT reliability. It shows that the 'shape' of the uncertainty trajectory—specifically whether entropy decreases at every step—is a superior and computationally cheap diagnostic for identifying reasoning failures.

From the abstract

Chain-of-thought (CoT) reasoning improves LLM accuracy, yet detecting failures cheaply remains elusive. We study whether the shape of uncertainty dynamics across reasoning steps--captured by sampling a few answer completions per step--predicts correctness.We introduce entropy-trajectory monotonicity: a chain is monotone if its per-step answer-distribution entropy decreases at every step. On GSM8K (n=300) with Qwen2.5-7B-Instruct, monotone chains achieve 68.8% accuracy vs. 46.8% for non-monotone