Identifies 'Visual Anchor Collapse' in DPO-aligned VLMs and introduces an asymmetric constraint to prevent models from ignoring visual evidence in favor of language priors.
March 24, 2026
Original Paper
ACPO: Counteracting Likelihood Displacement in Vision-Language Alignment with Asymmetric Constraints
arXiv · 2603.22165
The Takeaway
Standard DPO causes the probability of chosen responses to collapse alongside rejected ones, leading to hallucinations in multimodal models. ACPO fixes this by asymmetrically suppressing the rejected gradient, significantly improving visual grounding on major benchmarks like HallusionBench.
From the abstract
While Direct Preference Optimization (DPO) has become the de facto approach for aligning Large Vision-Language Models (LVLMs), it suffers from Likelihood Displacement, where the probability of both chosen and rejected responses collapses. This optimization flaw is especially detrimental in multimodal settings: the erosion of chosen likelihoods -- a failure we term Visual Anchor Collapse -- causes models to abandon visual evidence for strong language priors, precipitating significant hallucinatio