Statistical mechanics can track the migration of ancient humans better than historical records.
April 14, 2026
Original Paper
Phonological distances for linguistic typology and the origin of Indo-European languages
arXiv · 2604.11565
The Takeaway
Using information theory and "molecular clocks" for language, researchers linked 67 languages to their geographical origins in the Steppe. It proves that linguistic evolution follows physical laws, allowing us to solve historical mysteries by treating words like atoms.
From the abstract
We show that short-range phoneme dependencies encode large-scale patterns of linguistic relatedness, with direct implications for quantitative typology and evolutionary linguistics. Specifically, using an information-theoretic framework, we argue that phoneme sequences modeled as second-order Markov chains essentially capture the statistical correlations of a phonological system. This finding enables us to quantify distances among 67 modern languages from a multilingual parallel corpus employing