A single mathematical parameter—spectral entropy—can now predict exactly when an AI model's 'aha!' moment will occur.
April 16, 2026
Original Paper
Spectral Entropy Collapse as an Empirical Signature of Delayed Generalisation in Grokking
arXiv · 2604.13123
The Takeaway
Grokking has always been the 'black box' of AI—a mysterious sudden jump in performance that happens for no obvious reason. This paper identifies a specific scalar signature, 'spectral entropy collapse,' that happens right before a model generalizes. This provides the first predictive 'early warning system' for breakthroughs in model training. Instead of just hoping for a model to learn a complex task, researchers can now track this entropy collapse to see if generalization is imminent. This is a massive step for training efficiency, allowing engineers to decide whether to keep burning compute or cut their losses based on a mathematical law. It turns a philosophical mystery into an engineering metric.
From the abstract
Grokking -- delayed generalisation long after memorisation -- lacks a predictive mechanistic explanation. We identify the normalised spectral entropy $\tilde{H}(t)$ of the representation covariance as a scalar order parameter for this transition, validated on 1-layer Transformers on group-theoretic tasks. Five contributions: (i) Grokking follows a two-phase pattern: norm expansion then entropy collapse. (ii) $\tilde{H}$ crosses a stable threshold $\tilde{H}^* \approx 0.61$ before generalisation