Identifies a fundamental conflict in Direct Preference Optimization (DPO) for unified models, where image generation quality resists alignment while understanding improves.
March 19, 2026
Original Paper
Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models
arXiv · 2603.17044
The Takeaway
The study reveals a massive gradient magnitude imbalance (11-14x) between text and VQ-tokens that sabotages multi-task alignment. It provides a roadmap for practitioners to fix interference in unified models like Janus-Pro by balancing task-specific gradients.
From the abstract
Unified multimodal models share a language model backbone for both understanding and generating images. Can DPO align both capabilities simultaneously? We present the first systematic study of this question, applying DPO to Janus-Pro at 1B and 7B parameters under seven training strategies and two post-hoc methods. The central finding is negative: generation quality resists DPO alignment across all tested conditions on this architecture. No method improves generation CLIPScore at 7B (|Delta| 0.5