Millions of websites are now just AI agents talking to other AI agents, and this machine-made content is already dominating search results.
Empirical evidence from Common Crawl and Bing now confirms that the dead internet theory is a measurable reality. LLM-dominant websites are spreading rapidly and often outrank human-generated content in major search engines. This trend marks a permanent shift in how information is created and consumed on the open web. The internet is transitioning from a platform for human expression to a massive feedback loop for synthetic data. Practitioners must now account for the fact that most training data for future models will likely be generated by previous AI versions.
DeGenTWeb: A First Look at LLM-dominant Websites
arXiv · 2605.00087
Many recent news reports have claimed that content generated by large language models (LLMs) is taking over the web. However, these claims are typically not based on a representative sample of the web and the methodology underlying them is often opaque. Moreover, when aiming to minimize the chances of falsely attributing human-authored content to LLMs, we find that detectors of LLM-generated text perform much worse than advertised. Consequently, we lack an understanding of the true prevalence an