AI & ML Efficiency Breakthrough

Introduces adaptive video tokenization that allocates tokens based on scene complexity, reducing token usage by 24% while improving reconstruction quality.

March 13, 2026

Original Paper

EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation

Tianwei Xiong, Jun Hao Liew, Zilong Huang, Zhijie Lin, Jiashi Feng, Xihui Liu

arXiv · 2603.12267

The Takeaway

Fixed-length tokenizers waste compute on static video segments; EVATok uses lightweight routers to predict optimal token assignments per video block. This significantly lowers the computational cost for downstream autoregressive video generation models like Sora-style architectures.

From the abstract

Autoregressive (AR) video generative models rely on video tokenizers that compress pixels into discrete token sequences. The length of these token sequences is crucial for balancing reconstruction quality against downstream generation computational cost. Traditional video tokenizers apply a uniform token assignment across temporal blocks of different videos, often wasting tokens on simple, static, or repetitive segments while underserving dynamic or complex ones. To address this inefficiency, we