SeriesFusion
Science, curated & edited by AI
Nature Is Weird  /  AI

A decade-old visual glitch makes the world's most powerful AI models confidently lie about what they are seeing.

Simple adversarial perturbations originally discovered years ago can force models like GPT-5.4 and Claude Opus 4.6 to report false information with absolute authority. These models act as trusted fact-checkers, yet they are vulnerable to tiny image tweaks that a human eye would never notice. Attackers use these authority laundering techniques to turn an AI assistant into a source of misinformation that sounds perfectly reasonable. This vulnerability means that even the most advanced vision systems cannot be trusted for automated verification in high-stakes environments. The discovery exposes a massive security hole where the very models meant to protect us from lies can be weaponized to spread them.

Original Paper

Laundering AI Authority with Adversarial Examples

Jie Zhang, Pura Peetathawatchai, Florian Tramèr, Avital Shafran

arXiv  ·  2605.04261

Vision-language models (VLMs) are increasingly deployed as trusted authorities -- fact-checking images on social media, comparing products, and moderating content. Users implicitly trust that these systems perceive the same visual content as they do. We show that adversarial examples break this assumption, enabling \emph{AI authority laundering}: an attacker subtly perturbs an image so that the VLM produces confident and authoritative responses about the \emph{wrong} input. Unlike jailbreaks or