Achieves O(1) time complexity for dense component attribution in SwiGLU Transformers using a single forward-backward pass.
March 23, 2026
Original Paper
Dual Path Attribution: Efficient Attribution for SwiGLU-Transformers through Layer-Wise Target Propagation
arXiv · 2603.19742
The Takeaway
Tracing information flow in LLMs is usually computationally prohibitive or requires many counterfactual passes. DPA analytically linearizes SwiGLU structures to provide state-of-the-art faithfulness and unprecedented efficiency, enabling real-time interpretability for very long sequences and complex architectures.
From the abstract
Understanding the internal mechanisms of transformer-based large language models (LLMs) is crucial for their reliable deployment and effective operation. While recent efforts have yielded a plethora of attribution methods attempting to balance faithfulness and computational efficiency, dense component attribution remains prohibitively expensive. In this work, we introduce Dual Path Attribution (DPA), a novel framework that faithfully traces information flow on the frozen transformer in one forwa