Replaces quadratic self-attention with $O(N \log N)$ phase-native coupling for time-series, enabling massive context windows.
March 19, 2026
Original Paper
The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle
arXiv · 2603.17433
The Takeaway
By mapping sequence states to the unit circle and using Discrete Fourier Transforms for token mixing, this architecture eliminates the attention bottleneck in long-context temporal modeling. It provides a practical alternative to Transformers for high-frequency, long-horizon time-series forecasting.
From the abstract
Transformer models have redefined sequence learning, yet dot-product self-attention introduces a quadratic token-mixing bottleneck for long-context time-series. We introduce the \textbf{Phasor Transformer} block, a phase-native alternative representing sequence states on the unit-circle manifold $S^1$. Each block combines lightweight trainable phase-shifts with parameter-free Discrete Fourier Transform (DFT) token coupling, achieving global $\mathcal{O}(N\log N)$ mixing without explicit attentio