Paradigm Challenge / Psychology

The math used to decide if a scientific study is replicable is so broken that the label itself cannot be replicated.

AI-generated illustration

The Takeaway

Statistical tools for checking scientific reliability are often more flawed than the studies they aim to fix. Most experts believe that a second study failing to match the first means the original was a fluke. This analysis proves that irreducible variance between experiments makes these binary pass-fail labels mathematically unreliable. Even a perfectly true finding will fail a replication test a significant portion of the time due to sheer randomness. This means the replication crisis might be a product of bad math rather than bad science.

By SeriesFusion Editorial Board · May 1, 2026

Original Paper

The Difference Between "Replicable" and "Not replicable" is not Itself Scientifically Replicable

Berna Devezer, Erkan O. Buzbas

arXiv · 2604.26268

From the abstract

Replication studies estimate the replicability rate of scientific results by aggregating binary verdicts of experiments. Exact replications are rarely attainable, so most replication sequences are non-exact. Experiments differ in ways that matter and do not share a single data-generating process. We formalize two statistical interpretations of non-exactness. In a shared latent rate (benchmark) model, experiments are exchangeable and depend on a common random replicability rate. In a conditionall

Read the original paper →

← Back to today's papers