The 'routing paradox' proves that selective attention requires the very pairwise computations it aims to replace, explaining why pure recurrent models fail at associative recall.
March 24, 2026
Original Paper
When Does Content-Based Routing Work? Representation Requirements for Selective Attention in Hybrid Sequence Models
arXiv · 2603.20997
The Takeaway
It reframes attention as a representation constructor (writing pairwise matches into tokens) rather than just a computation mechanism. This result is critical for researchers designing hybrid architectures (like Mamba-Attention) as it maps exactly how many attention layers are needed to enable routing.
From the abstract
We identify a routing paradox in hybrid recurrent-attention architectures: content-based routing - deciding which tokens deserve expensive attention - requires exactly the pairwise computation that routing is designed to avoid. Through 20+ controlled experiments across three tasks (a synthetic diagnostic, the Zoology MQAR benchmark, and HotpotQA), we map the routing landscape exhaustively. One layer of softmax attention creates a latent ~34-dimensional subspace enabling 98.4% routing precision;