Replaces the quadratic cost of self-attention in Diffusion Transformers with a convection-diffusion PDE solved in the Fourier domain.
March 17, 2026
Original Paper
PDE-SSM: A Spectral State Space Approach to Spatial Mixing in Diffusion Transformers
arXiv · 2603.13663
The Takeaway
This approach achieves O(N log N) complexity while encoding strong spatial inductive biases directly into the architecture. It matches state-of-the-art generative performance while significantly reducing the compute required for high-resolution image and video synthesis.
From the abstract
The success of vision transformers-especially for generative modeling-is limited by the quadratic cost and weak spatial inductive bias of self-attention. We propose PDE-SSM, a spatial state-space block that replaces attention with a learnable convection-diffusion-reaction partial differential equation. This operator encodes a strong spatial prior by modeling information flow via physically grounded dynamics rather than all-to-all token interactions. Solving the PDE in the Fourier domain yields g