AI & ML Paradigm Challenge

AI 'scientists' are often just hallucinating patterns in random noise and telling you they're 99% sure about it.

April 14, 2026

Original Paper

Sanity Checks for Agentic Data Science

Zachary T. Rewolinski, Austin V. Zane, Hao Huang, Chandan Singh, Chenglong Wang, Jianfeng Gao, Bin Yu

arXiv · 2604.11003

The Takeaway

The research shows that data science agents frequently reach affirmative conclusions based on noise and suffer from poor self-calibration. This necessitates a new 'falsifiability' check for any scientific discovery driven by AI agents.

From the abstract

Agentic data science (ADS) pipelines have grown rapidly in both capability and adoption, with systems such as OpenAI Codex now able to directly analyze datasets and produce answers to statistical questions. However, these systems can reach falsely optimistic conclusions that are difficult for users to detect. To address this, we propose a pair of lightweight sanity checks grounded in the Predictability-Computability-Stability (PCS) framework for veridical data science. These checks use reasonabl