An audit of the world's largest immunology database found that 70% of the data was generated by AI models rather than experiments, creating a massive 'echo chamber.'
April 1, 2026
Original Paper
Resolution of recursive data corruption to transform T-cell epitope discovery
bioRxiv · 10.64898/2026.03.30.710191
The Takeaway
This reveals a systemic error in vaccine and T-cell therapy design where new AI models are being trained on the biased output of old ones rather than real-world biological facts. This discovery explains why many immunology treatments that look perfect in computer simulations end up failing miserably in actual clinical trials.
From the abstract
Accurate prediction of MHC class~I-presented peptides is essential for any vaccine or T-cell therapy design, yet reported gains on in silico benchmarks have not translated into clinical successes. We show that this discrepancy comes from a methodological error: immunopeptidomics datasets are fundamentally contaminated by existing prediction models through prediction-based deconvolution and filtering - an iterative confirmation bias. An audit of the IEDB, the biggest database in the field, reveal