AI & ML Paradigm Shift

Mathematically proves that the Transformer architecture is functionally equivalent to a Bayesian Network performing loopy belief propagation.

March 19, 2026

Original Paper

Transformers are Bayesian Networks

Gregory Coppola

arXiv · 2603.17063

The Takeaway

By mapping attention to 'AND' gates and feed-forward networks to 'OR' gates, this paper provides a new interpretability lens: every Transformer layer is one round of belief propagation. This offers a rigorous framework for verifiable inference in neural networks.

From the abstract

Transformers are the dominant architecture in AI, yet why they work remains poorly understood. This paper offers a precise answer: a transformer is a Bayesian network. We establish this in five ways.First, we prove that every sigmoid transformer with any weights implements weighted loopy belief propagation on its implicit factor graph. One layer is one round of BP. This holds for any weights -- trained, random, or constructed. Formally verified against standard mathematical axioms.Second, we giv

Read the original paper →

← Back to today's papers