Distillation makes an AI smarter at answering questions while simultaneously making it 20% more likely to lie with total confidence.
April 23, 2026
Original Paper
The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation
arXiv · 2604.16830
The Takeaway
Teaching a small model by using a larger teacher creates a dangerous side effect of overconfidence. The student model learns to mimic the answers of the teacher without having the same deep understanding. It becomes extremely certain of its responses even when it is completely wrong. This illusion of certainty makes the model less useful for high-stakes decisions where knowing the limits of knowledge is vital. We used to think distillation was a free lunch for model efficiency. Now we know it breaks the model ability to be honest about its own confusion.
From the abstract
On-policy distillation (OPD) is an increasingly important paradigm for post-training language models. However, we identify a pervasive Scaling Law of Miscalibration: while OPD effectively improves task accuracy, it systematically traps models in severe overconfidence. We trace this failure to an information mismatch: teacher supervision is formed under privileged context available during training, whereas the deployed model must report confidence using only deployment-time information. We formal