AI & ML Efficiency Breakthrough

Achieves O(1) time complexity for dense component attribution in SwiGLU Transformers using a single forward-backward pass.

March 23, 2026

Original Paper

Dual Path Attribution: Efficient Attribution for SwiGLU-Transformers through Layer-Wise Target Propagation

Lasse Marten Jantsch, Dong-Jae Koh, Seonghyeon Lee, Young-Kyoon Suh

arXiv · 2603.19742

The Takeaway

Tracing information flow in LLMs is usually computationally prohibitive or requires many counterfactual passes. DPA analytically linearizes SwiGLU structures to provide state-of-the-art faithfulness and unprecedented efficiency, enabling real-time interpretability for very long sequences and complex architectures.

From the abstract

Understanding the internal mechanisms of transformer-based large language models (LLMs) is crucial for their reliable deployment and effective operation. While recent efforts have yielded a plethora of attribution methods attempting to balance faithfulness and computational efficiency, dense component attribution remains prohibitively expensive. In this work, we introduce Dual Path Attribution (DPA), a novel framework that faithfully traces information flow on the frozen transformer in one forwa

Read the original paper →

← Back to today's papers