AI & ML Efficiency Breakthrough

EdgeDiT provides a hardware-aware blueprint for running massive Diffusion Transformers (DiT) on mobile NPUs with a 1.6x reduction in latency.

March 31, 2026

Original Paper

EdgeDiT: Hardware-Aware Diffusion Transformers for Efficient On-Device Image Generation

Sravanth Kodavanti, Manjunath Arveti, Sowmya Vajrala, Srinivas Miriyala, Vikram N R

arXiv · 2603.28405

The Takeaway

The framework systematically prunes structural redundancies specifically taxing for mobile data-flows on Apple and Qualcomm hardware. It establishes a new Pareto-optimal frontier for on-device image generation, enabling private and offline high-fidelity synthesis.

From the abstract

Diffusion Transformers (DiT) have established a new state-of-the-art in high-fidelity image synthesis; however, their massive computational complexity and memory requirements hinder local deployment on resource-constrained edge devices. In this paper, we introduce EdgeDiT, a family of hardware-efficient generative transformers specifically engineered for mobile Neural Processing Units (NPUs), such as the Qualcomm Hexagon and Apple Neural Engine (ANE). By leveraging a hardware-aware optimization