Massive AI foundation models for chemistry actually perform worse at predicting molecular properties than small, specialized models.
The bigger is better rule of LLMs does not apply to the world of drug discovery. This benchmark assessment found that compact, domain-specific models consistently beat out the giant, multi-billion parameter models. Pre-training on massive amounts of chemical data does not provide a universal advantage for specific scientific tasks. This challenges the current trend of trying to build one giant AI for everything. It means that small, efficient, and specialized AI is still the best tool for the hardest problems in chemistry.
Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction
bioRxiv · 10.64898/2026.04.29.721568
The rapid growth of molecular foundation models and general-purpose large language models has encouraged a scale-centric view of artificial intelligence in drug discovery, in which larger pretrained models are expected to supersede compact cheminformatics models and task-specific graph neural networks (GNNs). We test this assumption on 22 molecular property and activity endpoints, including public ADMET and Tox21 benchmarks and two internal anti-infective activity datasets. Across 167,056 held-o