AI & ML Paradigm Challenge

Intelligence is actually just the process of extreme data compression, and a "V-shaped" pattern in a model's layers proves it.

April 24, 2026

Original Paper

Intelligence as Predictive Compression: Evidence from GPT-2 Analysis and Learned Concept Bottlenecks

Ahmed Ghazouani

SSRN · 6376458

The Takeaway

This analysis of GPT-2 shows that the model's layers act as a funnel that crystallizes information into minimal representations. As the model gets smarter, it doesn't just learn more. it learns to store what it knows in a smaller, more efficient way. This discovery suggests that we can make AI much smaller without losing any of its power by forcing this compression. It shifts our understanding of thinking from an expansive process to a restrictive one. This could lead to high-performance AI that runs easily on local devices like phones. Efficiency is the true mark of intelligence in both biological and artificial systems.

From the abstract

We present a mathematical framework connecting intelligence to predictive compression through ε-machines (minimal sufficient statistics of the past for predicting the future) and demonstrate that modern transformer language models implicitly implement this compression. Through systematic reverse-engineering of GPT-2, we reveal a three-phase "V-shape" crystallization pattern: tokens compress into ∼200 predictive equivalence classes by layer 2, undergo controlled semantic disambiguation in middle

Read the original paper →

← Back to today's papers