Weirdly enough, AI trained on 'fake' data is actually better at predicting real pandemics than AI trained on actual history.
March 26, 2026
Original Paper
Leveraging Synthetic and Genetic Data to Improve Epidemic Forecasting
arXiv · 2603.24474
The Takeaway
We usually assume real-world data is the gold standard for training AI, but early in a pandemic, that data is often too messy and incomplete to be reliable. This research found that by using high-quality synthetic data combined with the virus's own genetic mutations, they could build models that outperformed the official forecasting ensembles used by leading global experts.
From the abstract
Forecasting infectious disease outbreaks is hard. Forecasting emerging infectious diseases with limited historical data is even harder. In this paper, we investigate ways to improve emerging infectious disease forecasting under operational constraints. Specifically, we explore two options likely to be available near the start of an emerging disease outbreak: synthetic data and genetic information. For this investigation, we conducted an experiment where we trained deep learning models on differe