AI & ML Collision

Statistical mechanics can track the migration of ancient humans better than historical records.

April 14, 2026

Original Paper

Phonological distances for linguistic typology and the origin of Indo-European languages

Marius Mavridis, Juan De Gregorio, Raul Toral, David Sanchez

arXiv · 2604.11565

The Takeaway

Using information theory and "molecular clocks" for language, researchers linked 67 languages to their geographical origins in the Steppe. It proves that linguistic evolution follows physical laws, allowing us to solve historical mysteries by treating words like atoms.

From the abstract

We show that short-range phoneme dependencies encode large-scale patterns of linguistic relatedness, with direct implications for quantitative typology and evolutionary linguistics. Specifically, using an information-theoretic framework, we argue that phoneme sequences modeled as second-order Markov chains essentially capture the statistical correlations of a phonological system. This finding enables us to quantify distances among 67 modern languages from a multilingual parallel corpus employing