You can make an advanced Vision-Language Model hallucinate wildly just by changing the lights in the room.
April 16, 2026
Original Paper
Challenging Vision-Language Models with Physically Deployable Multimodal Semantic Lighting Attacks
arXiv · 2604.12833
The Takeaway
We've seen 'adversarial stickers,' but this paper introduces 'Multimodal Semantic Lighting Attacks.' By using controllable physical lighting, the researchers tricked high-end models like LLaVA into seeing things that aren't there or completely misinterpreting a scene. This is a massive 'practical magic' threat because it requires no access to the code—just a smart bulb. It shows that AI 'reality' is incredibly fragile to environmental conditions. For those deploying AI in the real world (like security cameras or delivery robots), this is a critical new vulnerability. It shifts the security focus from digital patches to physical environment hardening.
From the abstract
Vision-Language Models (VLMs) have shown remarkable performance, yet their security remains insufficiently understood. Existing adversarial studies focus almost exclusively on the digital setting, leaving physical-world threats largely unexplored. As VLMs are increasingly deployed in real environments, this gap becomes critical, since adversarial perturbations must be physically realizable. Despite this practical relevance, physical attacks against VLMs have not been systematically studied. Such