AI & ML Paradigm Challenge

AI models don't get smart by remembering everything; they get smart by figuring out exactly what they need to forget.

April 10, 2026

Original Paper

Learning is Forgetting: LLM Training As Lossy Compression

Henry C. Conklin, Tom Hosking, Tan Yi-Chern, Julian Gold, Jonathan D. Cohen, Thomas L. Griffiths, Max Bartolo, Seraphina Goldfarb-Tarrant

arXiv · 2604.07569

The Takeaway

This paper reframes AI training as a process of lossy compression, showing that 'learning' is essentially the removal of noise to reach an information-theoretic bottleneck. It provides a mathematical roadmap for why certain models perform better based on how they discard irrelevant data.

From the abstract

Despite the increasing prevalence of large language models (LLMs), we still have a limited understanding of how their representational spaces are structured. This limits our ability to interpret how and what they learn or relate them to learning in humans. We argue LLMs are best seen as an instance of lossy compression, where over training they learn by retaining only information in their training data relevant to their objective(s). We show pre-training results in models that are optimally comp