AI & ML Efficiency Breakthrough

Optimizing autoregressive image models with Group Relative Policy Optimization (GRPO) achieves competitive quality without the 2x inference cost of Classifier-Free Guidance.

March 25, 2026

Original Paper

Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards

Orhun Buğra Baran, Melih Kandemir, Ramazan Gokberk Cinbis

arXiv · 2603.23086

The Takeaway

It enables RL-based alignment for image generation that improves both quality and diversity without mode collapse. By bypassing the need for Classifier-Free Guidance, it provides a significant inference speedup for production-grade image synthesis.

From the abstract

Autoregressive (AR) models are highly effective for image generation, yet their standard maximum-likelihood estimation training lacks direct optimization for sample quality and diversity. While reinforcement learning (RL) has been used to align diffusion models, these methods typically suffer from output diversity collapse. Similarly, concurrent RL methods for AR models rely strictly on instance-level rewards, often trading off distributional coverage for quality. To address these limitations, w

Read the original paper →

← Back to today's papers