AI & ML Breaks Assumption

Reveals that linearized attention never converges to the NTK limit in practice, explaining its unique 'influence malleability' compared to standard networks.

March 16, 2026

Original Paper

Influence Malleability in Linearized Attention: Dual Implications of Non-Convergent NTK Dynamics

Jose Marie Antonio Miñoza, Paulo Mario P. Medina, Sebastian C. Ibañez

arXiv · 2603.13085

The Takeaway

It challenges the conventional use of kernel frameworks to explain attention, showing that its non-convergence is actually the source of its power and its specific vulnerability to training-time adversarial attacks.

From the abstract

Understanding the theoretical foundations of attention mechanisms remains challenging due to their complex, non-linear dynamics. This work reveals a fundamental trade-off in the learning dynamics of linearized attention. Using a linearized attention mechanism with exact correspondence to a data-dependent Gram-induced kernel, both empirical and theoretical analysis through the Neural Tangent Kernel (NTK) framework shows that linearized attention does not converge to its infinite-width NTK limit,