Life Science Paradigm Challenge

An audit of the world's largest immunology database found that 70% of the data was generated by AI models rather than experiments, creating a massive 'echo chamber.'

April 1, 2026

Original Paper

Resolution of recursive data corruption to transform T-cell epitope discovery

Preibisch, G.; Tyrolski, M.; Kucharski, P.; Gizinski, S.; Grzegorczyk, P.; Moon, S.; Kim, S.; Zaro, B.; Gambin, A.

bioRxiv · 10.64898/2026.03.30.710191

The Takeaway

This reveals a systemic error in vaccine and T-cell therapy design where new AI models are being trained on the biased output of old ones rather than real-world biological facts. This discovery explains why many immunology treatments that look perfect in computer simulations end up failing miserably in actual clinical trials.

From the abstract

Accurate prediction of MHC class~I-presented peptides is essential for any vaccine or T-cell therapy design, yet reported gains on in silico benchmarks have not translated into clinical successes. We show that this discrepancy comes from a methodological error: immunopeptidomics datasets are fundamentally contaminated by existing prediction models through prediction-based deconvolution and filtering - an iterative confirmation bias. An audit of the IEDB, the biggest database in the field, reveal