When you shrink an AI to fit on a phone, it doesn’t just get slower—it gets weirdly cocky about things it’s wrong about and shy about things it actually knows.
April 13, 2026
Original Paper
Quantisation Reshapes the Metacognitive Geometry of Language Models
arXiv · 2604.08976
The Takeaway
This reveals that 'quantization' reshuffles the model’s internal self-awareness in unpredictable ways. Developers can no longer assume a smaller model is just a 'lower-resolution' version of the original; its very sense of certainty has been rearranged.
From the abstract
We report that model quantisation restructures domain-level metacognitive efficiency in LLMs rather than degrading it uniformly. Evaluating Llama-3-8B-Instruct on the same 3,000 questions at Q5_K_M and f16 precision, we find that M-ratio profiles across four knowledge domains are uncorrelated between formats (Spearman rho = 0.00). Arts & Literature moves from worst-monitored (M-ratio = 0.606 at Q5_K_M) to best-monitored (1.542 at f16). Geography moves from well-monitored (1.210) to under-monitor