Uses the Minimum Description Length principle to predict exactly when neural networks will transition from simple 'spurious' shortcuts to complex features.
March 30, 2026
Original Paper
A Compression Perspective on Simplicity Bias
arXiv · 2603.25839
The Takeaway
It provides a theoretical 'map' for practitioners to understand how data volume influences feature selection. This helps in diagnosing why models fail (by choosing 'simple' but wrong features) and how much data is needed to force the learning of robust cues.
From the abstract
Deep neural networks exhibit a simplicity bias, a well-documented tendency to favor simple functions over complex ones. In this work, we cast new light on this phenomenon through the lens of the Minimum Description Length principle, formalizing supervised learning as a problem of optimal two-part lossless compression. Our theory explains how simplicity bias governs feature selection in neural networks through a fundamental trade-off between model complexity (the cost of describing the hypothesis