AI & ML Breaks Assumption

Identifies a fundamental conflict in Direct Preference Optimization (DPO) for unified models, where image generation quality resists alignment while understanding improves.

March 19, 2026

Original Paper

Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models

Abinav Rao, Sujan Rachuri

arXiv · 2603.17044

The Takeaway

The study reveals a massive gradient magnitude imbalance (11-14x) between text and VQ-tokens that sabotages multi-task alignment. It provides a roadmap for practitioners to fix interference in unified models like Janus-Pro by balancing task-specific gradients.

From the abstract

Unified multimodal models share a language model backbone for both understanding and generating images. Can DPO align both capabilities simultaneously? We present the first systematic study of this question, applying DPO to Janus-Pro at 1B and 7B parameters under seven training strategies and two post-hoc methods. The central finding is negative: generation quality resists DPO alignment across all tested conditions on this architecture. No method improves generation CLIPScore at 7B (|Delta| 0.5

Read the original paper →

← Back to today's papers