AI & ML Scaling Insight

Mathematical proof that LayerNorm structurally reduces model complexity compared to RMSNorm due to its mean-centering geometry.

March 31, 2026

Original Paper

The Geometric Cost of Normalization: Affine Bounds on the Bayesian Complexity of Neural Networks

Sungbae Chun

arXiv · 2603.27432

The Takeaway

Using Singular Learning Theory, the paper quantifies the 'Bayesian complexity' cost of normalization, proving LayerNorm reduces the local learning coefficient by exactly m/2 while RMSNorm preserves it. This provides a theoretical basis for choosing normalization layers based on desired model capacity and data manifold curvature.

From the abstract

LayerNorm and RMSNorm impose fundamentally different geometric constraints on their outputs - and this difference has a precise, quantifiable consequence for model complexity. We prove that LayerNorm's mean-centering step, by confining data to a linear hyperplane (through the origin), reduces the Local Learning Coefficient (LLC) of the subsequent weight matrix by exactly $m/2$ (where $m$ is its output dimension); RMSNorm's projection onto a sphere preserves the LLC entirely. This reduction is st