AI & ML Efficiency Breakthrough

A 1D continuous image tokenizer that uses semantic masking to achieve a 64x reduction in token usage without sacrificing generation fidelity.

April 1, 2026

Original Paper

MacTok: Robust Continuous Tokenization for Image Generation

Hengyu Zeng, Xin Gao, Guanghao Li, Yuxiang Yan, Jiaoyang Ruan, Junpeng Ma, Haoyu Albert Wang, Jian Pu

arXiv · 2603.29634

The Takeaway

By preventing posterior collapse through DINO-guided masking, it allows high-quality 512x512 image generation using as few as 64 tokens. This drastically lowers the compute floor for high-fidelity generative vision models.

From the abstract

Continuous image tokenizers enable efficient visual generation, and those based on variational frameworks can learn smooth, structured latent representations through KL regularization. Yet this often leads to posterior collapse when using fewer tokens, where the encoder fails to encode informative features into the compressed latent space. To address this, we introduce \textbf{MacTok}, a \textbf{M}asked \textbf{A}ugmenting 1D \textbf{C}ontinuous \textbf{Tok}enizer that leverages image masking an