Proposes a mathematical framework where 'spectral gaps' in parameter updates control phase transitions like grokking and loss plateaus.
April 1, 2026
Original Paper
The Spectral Edge Thesis: A Mathematical Framework for Intra-Signal Phase Transitions in Neural Network Training
arXiv · 2603.28964
The Takeaway
It provides a predictable, theoretical basis for understanding non-linear training dynamics that were previously considered mysterious or stochastic. Researchers can use this 'Spectral Edge' lens to monitor circuit stability and anticipate capability gains before they happen.
From the abstract
We develop the spectral edge thesis: phase transitions in neural network training -- grokking, capability gains, loss plateaus -- are controlled by the spectral gap of the rolling-window Gram matrix of parameter updates. In the extreme aspect ratio regime (parameters $P \sim 10^8$, window $W \sim 10$), the classical BBP detection threshold is vacuous; the operative structure is the intra-signal gap separating dominant from subdominant modes at position $k^* = \mathrm{argmax}\, \sigma_j/\sigma_{j