AI & ML Paradigm Challenge

Weirdly enough, AI trained on 'fake' data is actually better at predicting real pandemics than AI trained on actual history.

March 26, 2026

Original Paper

Leveraging Synthetic and Genetic Data to Improve Epidemic Forecasting

Dave Osthus, Alexander C. Murph, Emma E. Goldberg, Lauren J. Beesley, William M. Fischer, Nidhi K. Parikh, Lauren A. Castro

arXiv · 2603.24474

The Takeaway

We usually assume real-world data is the gold standard for training AI, but early in a pandemic, that data is often too messy and incomplete to be reliable. This research found that by using high-quality synthetic data combined with the virus's own genetic mutations, they could build models that outperformed the official forecasting ensembles used by leading global experts.

From the abstract

Forecasting infectious disease outbreaks is hard. Forecasting emerging infectious diseases with limited historical data is even harder. In this paper, we investigate ways to improve emerging infectious disease forecasting under operational constraints. Specifically, we explore two options likely to be available near the start of an emerging disease outbreak: synthetic data and genetic information. For this investigation, we conducted an experiment where we trained deep learning models on differe