economics Paradigm Challenge

AI is getting more biased because the most honest websites are the ones blocking AI from reading their data.

April 10, 2026

Original Paper

Adverse Selection in the AI Data Commons

Kai Zhu

SSRN · 6438640

AI-generated illustration

The Takeaway

High-quality, factual websites are opting out of AI training data six times faster than low-quality sources. This means future AIs will be trained on a toxic sludge of misinformation because the truth has been locked away behind paywalls and copyright blocks.

From the abstract

Generative AI depends on high-quality web content, yet no market compensates its producers. We document adverse selection in this AI data commons: facing a binary opt-out choice, the highest-quality producers exit first, degrading the remaining commons. Studying media and news sites at scale, we find a steep quality-blocking gradient: high-factual outlets block at nearly six times the rate of low-factual sources, with misinformation sources remaining most accessible. Publishers strategically tar