AI & ML Scaling Insight

The eigenvalue tail index of a neural network's weight matrices serves as a near-perfect (R^2 = 0.984) diagnostic for label noise in the training data.

March 31, 2026

Original Paper

Spectral Signatures of Data Quality: Eigenvalue Tail Index as a Diagnostic for Label Noise in Neural Networks

Matthew Loftus

arXiv · 2603.27885

The Takeaway

It provides a mathematically grounded way to detect poor data quality just by looking at the spectral properties of the weights. This could allow for automated data-quality audits of pre-trained models where the original training data is unavailable.

From the abstract

We investigate whether spectral properties of neural network weight matrices can predict test accuracy. Under controlled label noise variation, the tail index alpha of the eigenvalue distribution at the network's bottleneck layer predicts test accuracy with leave-one-out R^2 = 0.984 (21 noise levels, 3 seeds per level), far exceeding all baselines: the best conventional metric (Frobenius norm of the optimal layer) achieves LOO R^2 = 0.149. This relationship holds across three architectures (MLP,