AI 'scientists' are often just hallucinating patterns in random noise and telling you they're 99% sure about it.
April 14, 2026
Original Paper
Sanity Checks for Agentic Data Science
arXiv · 2604.11003
The Takeaway
The research shows that data science agents frequently reach affirmative conclusions based on noise and suffer from poor self-calibration. This necessitates a new 'falsifiability' check for any scientific discovery driven by AI agents.
From the abstract
Agentic data science (ADS) pipelines have grown rapidly in both capability and adoption, with systems such as OpenAI Codex now able to directly analyze datasets and produce answers to statistical questions. However, these systems can reach falsely optimistic conclusions that are difficult for users to detect. To address this, we propose a pair of lightweight sanity checks grounded in the Predictability-Computability-Stability (PCS) framework for veridical data science. These checks use reasonabl