AI & ML Efficiency Breakthrough

Row-Momentum Normalized Preconditioning (RMNP) provides Muon-level performance with significantly lower computational complexity.

March 24, 2026

Original Paper

RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization

Shenyang Deng, Zhuoli Ouyang, Tianyu Pang, Zihang Liu, Ruochen Jin, Shuhua Yu, Yaoqing Yang

arXiv · 2603.20527

The Takeaway

It replaces expensive Newton-Schulz iterations with simple row-wise L2 normalization. This reduces the per-iteration complexity of second-order-like preconditioning from cubic to linear relative to weight matrix dimensions, making advanced optimization practical for massive models.

From the abstract

Preconditioned adaptive methods have gained significant attention for training deep neural networks, as they capture rich curvature information of the loss landscape . The central challenge in this field lies in balancing preconditioning effectiveness with computational efficiency of implementing the preconditioner. Among recent advances, \textsc{Muon} stands out by using Newton-Schulz iteration to obtain preconditioned updates without explicitly constructing the preconditioning matrix. Despite