Introduces lightweight equilibration to the Muon optimizer, significantly stabilizing and accelerating LLM pretraining.
March 31, 2026
Original Paper
MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration
arXiv · 2603.28254
The Takeaway
Building on the high-performance Muon optimizer, this paper adds row/column normalization to rebalance the momentum matrix before orthogonalization. It yields faster convergence and lower perplexity in LLaMA-scale pretraining with negligible memory overhead.
From the abstract
Orthogonalized-update optimizers such as Muon improve training of matrix-valued parameters, but existing extensions mostly act either after orthogonalization by rescaling updates or before it with heavier whitening-based preconditioners. We introduce {\method}, a lightweight family of pre-orthogonalization equilibration schemes for Muon in three forms: two-sided row/column normalization (RC), row normalization (R), and column normalization (C). These variants rebalance the momentum matrix before