The math used to decide if a scientific study is replicable is so broken that the label itself cannot be replicated.
Statistical tools for checking scientific reliability are often more flawed than the studies they aim to fix. Most experts believe that a second study failing to match the first means the original was a fluke. This analysis proves that irreducible variance between experiments makes these binary pass-fail labels mathematically unreliable. Even a perfectly true finding will fail a replication test a significant portion of the time due to sheer randomness. This means the replication crisis might be a product of bad math rather than bad science.
The Difference Between "Replicable" and "Not replicable" is not Itself Scientifically Replicable
arXiv · 2604.26268
Replication studies estimate the replicability rate of scientific results by aggregating binary verdicts of experiments. Exact replications are rarely attainable, so most replication sequences are non-exact. Experiments differ in ways that matter and do not share a single data-generating process. We formalize two statistical interpretations of non-exactness. In a shared latent rate (benchmark) model, experiments are exchangeable and depend on a common random replicability rate. In a conditionall