AI & ML Efficiency Breakthrough

SAGE mitigates multimodal hallucinations by monitoring 'attention sinks' and dynamically modulating self-attention during the decoding process.

March 31, 2026

Original Paper

SAGE: Sink-Aware Grounded Decoding for Multimodal Hallucination Mitigation

Tripti Shukla, Zsolt Kira

arXiv · 2603.27898

The Takeaway

Unlike post-hoc verification or expensive retraining, SAGE intervenes in real-time by identifying when the model is over-attending to semantically weak tokens (sinks). It offers a training-free way to improve VLM reliability by grounding generation in visual features only when needed.

From the abstract

Large vision-language models (VLMs) frequently suffer from hallucinations, generating content that is inconsistent with visual inputs. Existing methods typically address this problem through post-hoc filtering, additional training objectives, or external verification, but they do not intervene during the decoding process when hallucinations arise. In this work, we introduce SAGE, a Sink-Aware Grounded Decoding framework that mitigates hallucinations by dynamically modulating self-attention durin