Your Vision-Language Models aren't just hallucinating; they suffer from 'semantic fixation' that makes them ignore your explicit instructions.
April 17, 2026
Original Paper
Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models
arXiv · 2604.12119
The Takeaway
We often blame 'bad data' for VLM failures, but this research shows a deeper flaw: models stick to default interpretations even when explicitly told a scene's rules have changed. They have a stubborn inability to override common sense with prompt-based logic. This means current models are fundamentally limited in 'what-if' scenarios or specialized environments that don't match the training distribution. Practitioners need to move beyond better prompts and start looking for ways to break these rigid internal world-models. It highlights a critical barrier in making AI truly adaptable to novel user-defined realities.
From the abstract
Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mapping. To isolate this effect, we introduce VLM-Fix, a controlled benchmark over four abstract strategy games that evaluates identical terminal board states under paired standard and