Discovers that LLM hidden states undergo geometric 'warping' at digit-count boundaries, mimicking human psychological perception.
March 31, 2026
Original Paper
Categorical Perception in Large Language Model Hidden States: Structural Warping at Digit-Count Boundaries
arXiv · 2603.28258
The Takeaway
The study proves that structural tokenization discontinuities (like 9 to 10) create discrete 'category' boundaries in the model's internal geometry. This insight challenges the idea that LLMs treat numbers as continuous values and reveals how architectural choices dictate semantic representation independently of training data.
From the abstract
Categorical perception (CP) -- enhanced discriminability at category boundaries -- is among the most studied phenomena in perceptual psychology. This paper reports that analogous geometric warping occurs in the hidden-state representations of large language models (LLMs) processing Arabic numerals. Using representational similarity analysis across six models from five architecture families, the study finds that a CP-additive model (log-distance plus a boundary boost) fits the representational ge