If we keep feeding AI its own generated text, it eventually gets a weird kind of digital dementia where human language loses all its flavor.
April 13, 2026
Original Paper
Drift and selection in LLM text ecosystems
arXiv · 2604.08554
The Takeaway
This provides a formal mathematical proof for the fear of model collapse, showing that an AI-generated feedback loop doesn't just stagnate—it actively destroys information. It warns that without fresh human data, the public internet could become a linguistic desert.
From the abstract
The public text record -- the material from which both people and AI systems now learn -- is increasingly shaped by its own outputs. Generated text enters the public record, later agents learn from it, and the cycle repeats. Here we develop an exactly solvable mathematical framework for this recursive process, based on variable-order $n$-gram agents, and separate two forces acting on the public corpus. The first is drift: unfiltered reuse progressively removes rare forms, and in the infinite-cor