AI & ML Paradigm Shift

Identifies 'Visual Anchor Collapse' in DPO-aligned VLMs and introduces an asymmetric constraint to prevent models from ignoring visual evidence in favor of language priors.

March 24, 2026

Original Paper

ACPO: Counteracting Likelihood Displacement in Vision-Language Alignment with Asymmetric Constraints

Kaili Huang, Hongming Zhang, Rui Shen, Linjun Dai, Jiahao Wang, Hanming Deng, Lewei Lu

arXiv · 2603.22165

The Takeaway

Standard DPO causes the probability of chosen responses to collapse alongside rejected ones, leading to hallucinations in multimodal models. ACPO fixes this by asymmetrically suppressing the rejected gradient, significantly improving visual grounding on major benchmarks like HallusionBench.

From the abstract

While Direct Preference Optimization (DPO) has become the de facto approach for aligning Large Vision-Language Models (LVLMs), it suffers from Likelihood Displacement, where the probability of both chosen and rejected responses collapses. This optimization flaw is especially detrimental in multimodal settings: the erosion of chosen likelihoods -- a failure we term Visual Anchor Collapse -- causes models to abandon visual evidence for strong language priors, precipitating significant hallucinatio