Mathematically proves that the Transformer architecture is functionally equivalent to a Bayesian Network performing loopy belief propagation.
March 19, 2026
Original Paper
Transformers are Bayesian Networks
arXiv · 2603.17063
The Takeaway
By mapping attention to 'AND' gates and feed-forward networks to 'OR' gates, this paper provides a new interpretability lens: every Transformer layer is one round of belief propagation. This offers a rigorous framework for verifiable inference in neural networks.
From the abstract
Transformers are the dominant architecture in AI, yet why they work remains poorly understood. This paper offers a precise answer: a transformer is a Bayesian network. We establish this in five ways.First, we prove that every sigmoid transformer with any weights implements weighted loopy belief propagation on its implicit factor graph. One layer is one round of BP. This holds for any weights -- trained, random, or constructed. Formally verified against standard mathematical axioms.Second, we giv