Leverages human gaze tracking to assign non-uniform token density in diffusion models, creating perceptually perfect images with significantly less compute.
March 25, 2026
Original Paper
Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation
arXiv · 2603.23491
The Takeaway
By focusing high resolution only on the foveal region, this method drastically reduces the quadratic complexity of token generation. This is a critical breakthrough for the future of real-time high-resolution VR/AR content generation.
From the abstract
Diffusion and flow matching models have unlocked unprecedented capabilities for creative content creation, such as interactive image and streaming video generation. The growing demand for higher resolutions, frame rates, and context lengths, however, makes efficient generation increasingly challenging, as computational complexity grows quadratically with the number of generated tokens. Our work seeks to optimize the efficiency of the generation process in settings where the user's gaze location