Stochastic Attention achieves a global receptive field in O(log n) layers by using randomized routing inspired by the fruit fly connectome.
April 2, 2026
Original Paper
Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention
arXiv · 2604.00754
The Takeaway
It provides a training-free way to upgrade sliding-window attention to global attention with linear-time complexity. This represents a practical primitive for scaling Transformer sequence lengths without the quadratic cost or the performance drops of traditional sparse attention.
From the abstract
The whole-brain connectome of a fruit fly comprises over 130K neurons connected with a probability of merely 0.02%, yet achieves an average shortest path of only 4.4 hops. Despite being highly structured at the circuit level, the network's long-range connections are broadly distributed across brain regions, functioning as stochastic shortcuts that enable efficient global communication. Inspired by this observation, we propose Stochastic Attention (SA), a drop-in enhancement for sliding-window at