AI & ML Breaks Assumption

Identifies architectural 'stream separation' as the key to making linear safety interventions effective.

March 24, 2026

Original Paper

Stream separation improves Bregman conditioning in transformers

James Clayton Kerce

arXiv · 2603.21317

The Takeaway

Standard transformers exhibit severe geometric degeneracy at intermediate layers, making linear steering (probing/erasure) unreliable. The authors show that separating streams significantly improves conditioning, which is a critical insight for researchers working on mechanistic interpretability and reliable model alignment.

From the abstract

Linear methods for steering transformer representations, including probing, activation engineering, and concept erasure, implicitly assume the geometry of representation space is Euclidean. Park et al. [Park et al., 2026] showed that softmax induces a curved Bregman geometry whose metric tensor is the Hessian of the log-normalizer, $H({\lambda}) = Cov[{\gamma} | {\lambda}]$. Ignoring this curvature causes Euclidean steering to leak probability mass to unintended tokens. Their analysis applies at

Read the original paper →

← Back to today's papers