hidden states in LLMs occupy a Riemannian submanifold where tokens are Voronoi regions, revealing a universal 'hourglass' intrinsic dimension profile across all tested models.
March 25, 2026
Original Paper
Latent Semantic Manifolds in Large Language Models
arXiv · 2603.22301
The Takeaway
Provides a geometric explanation for why quantization works (or fails) and identifies a persistent 'hard core' of representations at the boundaries. This offers a theoretical basis for improving model compression and decoding strategies.
From the abstract
Large Language Models (LLMs) perform internal computations in continuous vector spaces yet produce discrete tokens -- a fundamental mismatch whose geometric consequences remain poorly understood. We develop a mathematical framework that interprets LLM hidden states as points on a latent semantic manifold: a Riemannian submanifold equipped with the Fisher information metric, where tokens correspond to Voronoi regions partitioning the manifold. We define the expressibility gap, a geometric measure