Identifies architectural 'stream separation' as the key to making linear safety interventions effective.
March 24, 2026
Original Paper
Stream separation improves Bregman conditioning in transformers
arXiv · 2603.21317
The Takeaway
Standard transformers exhibit severe geometric degeneracy at intermediate layers, making linear steering (probing/erasure) unreliable. The authors show that separating streams significantly improves conditioning, which is a critical insight for researchers working on mechanistic interpretability and reliable model alignment.
From the abstract
Linear methods for steering transformer representations, including probing, activation engineering, and concept erasure, implicitly assume the geometry of representation space is Euclidean. Park et al. [Park et al., 2026] showed that softmax induces a curved Bregman geometry whose metric tensor is the Hessian of the log-normalizer, $H({\lambda}) = Cov[{\gamma} | {\lambda}]$. Ignoring this curvature causes Euclidean steering to leak probability mass to unintended tokens. Their analysis applies at