AI & ML Collision

Transformers and diffusion models are actually the same mathematical object viewed from different angles.

April 14, 2026

Original Paper

The Diffusion-Attention Connection

Julio Candanedo

arXiv · 2604.09560

The Takeaway

The paper unifies Transformers, diffusion maps, and magnetic Laplacians into a single Markov geometry derived from pre-softmax query scores. This means architectural breakthroughs in diffusion—like stable sampling schedules—can be directly translated into Transformer attention mechanisms through a shared geometric lens.

From the abstract

Transformers, diffusion-maps, and magnetic Laplacians are usually treated as separate tools; we show they are all different regimes of a single Markov geometry built from pre-softmax query-scores. We define a QK "bidivergence" whose exponentiated and normalized forms yield attention, diffusion-maps, and magnetic diffusion. And use product of experts and Schrödinger-bridges to connect and organize them into equilibrium, nonequilibrium steady-state, and driven dynamics.

Read the original paper →

← Back to today's papers