Replaces the heuristic constant momentum (0.9) with a parameter-free, physics-inspired schedule that speeds up convergence by nearly 2x.
April 1, 2026
Original Paper
Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training
arXiv · 2603.28921
The Takeaway
It challenges the 60-year convention of fixed momentum in neural network training. By deriving momentum from critical damping, it provides a principled way to accelerate training and a new diagnostic tool for localizing layer-wise failure modes in models.
From the abstract
Standard neural network training uses constant momentum (typically 0.9), a convention dating to 1964 with limited theoretical justification for itsoptimality. We derive a time-varying momentum schedule from the critically damped harmonic oscillator: mu(t) = 1 - 2*sqrt(alpha(t)), where alpha(t) isthe current learning rate. This beta-schedule requires zero free parameters beyond the existing learning rate schedule. On ResNet-18/CIFAR-10,beta-scheduling delivers 1.9x faster convergence to 90% accur