AI is getting more biased because the most honest websites are the ones blocking AI from reading their data.
April 10, 2026
Original Paper
Adverse Selection in the AI Data Commons
SSRN · 6438640
AI-generated illustration
The Takeaway
High-quality, factual websites are opting out of AI training data six times faster than low-quality sources. This means future AIs will be trained on a toxic sludge of misinformation because the truth has been locked away behind paywalls and copyright blocks.
From the abstract
Generative AI depends on high-quality web content, yet no market compensates its producers. We document adverse selection in this AI data commons: facing a binary opt-out choice, the highest-quality producers exit first, degrading the remaining commons. Studying media and news sites at scale, we find a steep quality-blocking gradient: high-factual outlets block at nearly six times the rate of low-factual sources, with misinformation sources remaining most accessible. Publishers strategically tar