AI models don't get smart by remembering everything; they get smart by figuring out exactly what they need to forget.
April 10, 2026
Original Paper
Learning is Forgetting: LLM Training As Lossy Compression
arXiv · 2604.07569
The Takeaway
This paper reframes AI training as a process of lossy compression, showing that 'learning' is essentially the removal of noise to reach an information-theoretic bottleneck. It provides a mathematical roadmap for why certain models perform better based on how they discard irrelevant data.
From the abstract
Despite the increasing prevalence of large language models (LLMs), we still have a limited understanding of how their representational spaces are structured. This limits our ability to interpret how and what they learn or relate them to learning in humans. We argue LLMs are best seen as an instance of lossy compression, where over training they learn by retaining only information in their training data relevant to their objective(s). We show pre-training results in models that are optimally comp