AI & ML Nature Is Weird

AlphaFold 3 ignores the sheer volume of biological data and instead prioritizes how weird and different a specific species is.

April 25, 2026

Original Paper

AlphaInterp: Probing AlphaFold 3's Internal Representations Reveals Evolutionary Determinants of Predicted Structure and Confidence

bioRxiv · 10.64898/2026.04.22.720175

The Takeaway

The AI determines protein structures by looking at phylogenetic diversity rather than the total number of sequences in its database. A few highly divergent homologs provide more structural insight to the model than thousands of nearly identical ones. Scientists previously assumed that the massive size of the training set was the primary driver of AlphaFold's success. This revelation shows the model has learned to think like an evolutionary biologist by seeking out the most unique genetic examples. For drug discovery and synthetic biology, this means collecting data from rare organisms is much more valuable than sequencing common ones.

From the abstract

AlphaFold 3 predicts the three-dimensional structures of proteins and their complexes with remarkable accuracy, yet the computations by which it converts evolutionary information into structure have remained opaque. Here, in the first systematic mechanistic interpretability analysis of AlphaFold 3, we show that the model relies predominantly on comparative evolutionary context rather than raw sequence, and that a few divergent homologs contribute more to accurate prediction than many near-identi