SeriesFusion
Science, curated & edited by AI
Collision  /  AI

A theory of how the human brain avoids surprises just cut the error rate of AI models by 38% during sudden topic changes.

Mixture-of-Experts models often struggle with a cold start problem when they switch between different domains or tasks. Integrating the Free Energy Principle allows the model to anticipate these shifts and prepare its internal routing accordingly. Perplexity scores dropped from 6.56 to 4.01 when the model was allowed to minimize its internal surprise about new data. This approach bridges the gap between high-level neuroscience and practical transformer engineering. It suggests that the next generation of efficient AI will need to behave more like a biological system that actively predicts its environment.

Original Paper

Affinity Is Not Enough: Recovering the Free Energy Principle in Mixture-of-Experts

Man Yung Wong

arXiv  ·  2605.00604

Sparse MoE routing fails at domain transitions, where the current token belongs to one distribution and the next to another. In a controlled experiment (4 experts, 5 seeds), standard affinity routing assigns only 0.006 +/- 0.001 probability to the correct expert at the transition. Three lightweight gate modifications raise this to 0.748 +/- 0.002 (124x), cutting experts needed for 99% coverage from infeasible to a small constant: temporal memory (beta), a per-expert LIF membrane potential accumu