AI & ML Nature Is Weird

Large models know they are about to lie before they even output the first token, but small models are completely clueless.

April 17, 2026

Original Paper

Before the First Token: Scale-dependent Emergence of Hallucination Signals in Autoregressive Language Models

SSRN · 6465859

The Takeaway

This study finds that once a model passes the 400M parameter threshold, it develops an internal representation of 'truthfulness.' You can literally see a hallucination signal in the internal states before the model starts generating text. Smaller models don't have this; they are effectively confabulating without any internal awareness of their own errors. This means that for models of a certain scale, we can build 'pre-output' filters that kill hallucinations before the user ever sees them. It proves that self-awareness of truth is an emergent property of scale.

From the abstract

When do large language models make decisions to generate fake information? Doing so in fields <br> such as health care, law, science research, or making financial decisions has serious consequences; <br> but still there are few formal answers to this question. Recent studies have shown that there are <br> differences in how autoregressive language models represent internally whether they are providing <br> factual versus fictional responses, demonstrating that these models have some form of inte