Frontier LLMs lack 'scientific intuition' and can't tell the difference between a predictable result and a physical experiment.
April 15, 2026
Original Paper
SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?
arXiv · 2604.10718
The Takeaway
Despite the hype about AI accelerating science, this study shows that LLMs cannot reliably predict experimental outcomes or distinguish 'obvious' results from 'novel' ones. Unlike human experts, they lack the calibration to know what they don't know in a lab setting. This challenges the idea that scaling will eventually 'solve' science. For researchers, it means AI is currently a better 'assistant' for literature search than a 'partner' for hypothesis generation. We still need humans to decide what's worth testing in the physical world, as AI doesn't yet understand the 'cost' of a physical experiment.
From the abstract
Accelerating scientific discovery requires the identification of which experiments would yield the best outcomes before committing resources to costly physical validation. While existing benchmarks evaluate LLMs on scientific knowledge and reasoning, their ability to predict experimental outcomes - a task where AI could significantly exceed human capabilities - remains largely underexplored. We introduce SciPredict, a benchmark comprising 405 tasks derived from recent empirical studies in 33 spe