A foundation model for the brain can now predict human neural activity across sight, sound, and language in a digital simulation.
Studying the human brain usually requires expensive equipment and hundreds of volunteer hours to get results. This new AI system, called TRIBE v2, can replicate decades of real-world neuroscience experiments in a computer. It processes visual, auditory, and linguistic data to forecast how a human brain will react to various stimuli with high accuracy. This allows researchers to run thousands of in-silico experiments at a fraction of the cost of traditional methods. It marks a shift where the biggest mysteries of the mind can be explored through rapid, scalable simulations rather than slow human trials.
A foundation model of vision, audition, and language for in-silico neuroscience
arXiv · 2605.04326
Cognitive neuroscience is fragmented into specialized models, each tailored to specific experimental paradigms, hence preventing a unified model of cognition in the human brain. Here, we introduce TRIBE v2, a tri-modal (video, audio and language) foundation model capable of predicting human brain activity in a variety of naturalistic and experimental conditions. Leveraging a unified dataset of over 1,000 hours of fMRI across 720 subjects, we demonstrate that our model accurately predicts high-re