AI & ML Nature Is Weird

Your Vision-Language Models aren't just hallucinating; they suffer from 'semantic fixation' that makes them ignore your explicit instructions.

April 17, 2026

Original Paper

Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models

arXiv · 2604.12119

The Takeaway

We often blame 'bad data' for VLM failures, but this research shows a deeper flaw: models stick to default interpretations even when explicitly told a scene's rules have changed. They have a stubborn inability to override common sense with prompt-based logic. This means current models are fundamentally limited in 'what-if' scenarios or specialized environments that don't match the training distribution. Practitioners need to move beyond better prompts and start looking for ways to break these rigid internal world-models. It highlights a critical barrier in making AI truly adaptable to novel user-defined realities.

From the abstract

Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mapping. To isolate this effect, we introduce VLM-Fix, a controlled benchmark over four abstract strategy games that evaluates identical terminal board states under paired standard and

Read the original paper →

← Back to today's papers