AI & ML Breaks Assumption

FINER discovers that MLLMs are highly prone to hallucination when images contain fine-grained mismatches co-occurring with real elements.

March 19, 2026

Original Paper

FINER: MLLMs Hallucinate under Fine-grained Negative Queries

Rui Xiao, Sanghwan Kim, Yongqin Xian, Zeynep Akata, Stephan Alaniz

arXiv · 2603.17662

The Takeaway

It exposes a specific, previously overlooked failure mode in vision-language models. By proposing 'FINER-Tuning' with DPO, it shows how to significantly boost hallucination resistance (up to 24%) across multiple frontier models and benchmarks.

From the abstract

Multimodal large language models (MLLMs) struggle with hallucinations, particularly with fine-grained queries, a challenge underrepresented by existing benchmarks that focus on coarse image-related questions. We introduce FIne-grained NEgative queRies (FINER), alongside two benchmarks: FINER-CompreCap and FINER-DOCCI. Using FINER, we analyze hallucinations across four settings: multi-object, multi-attribute, multi-relation, and ``what'' questions. Our benchmarks reveal that MLLMs hallucinate whe