The eigenvalue tail index of a neural network's weight matrices serves as a near-perfect (R^2 = 0.984) diagnostic for label noise in the training data.
March 31, 2026
Original Paper
Spectral Signatures of Data Quality: Eigenvalue Tail Index as a Diagnostic for Label Noise in Neural Networks
arXiv · 2603.27885
The Takeaway
It provides a mathematically grounded way to detect poor data quality just by looking at the spectral properties of the weights. This could allow for automated data-quality audits of pre-trained models where the original training data is unavailable.
From the abstract
We investigate whether spectral properties of neural network weight matrices can predict test accuracy. Under controlled label noise variation, the tail index alpha of the eigenvalue distribution at the network's bottleneck layer predicts test accuracy with leave-one-out R^2 = 0.984 (21 noise levels, 3 seeds per level), far exceeding all baselines: the best conventional metric (Frobenius norm of the optimal layer) achieves LOO R^2 = 0.149. This relationship holds across three architectures (MLP,