AI & ML Nature Is Weird

Multimodal AI backdoors hide inside a specific mathematical subspace of the projector rather than in the text neurons.

April 24, 2026

Original Paper

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

arXiv · 2604.19083

The Takeaway

Most security experts look for triggers in the main layers of a neural network. This research reveals that visual triggers are actually encoded in the low-rank subspace that connects images to text. These triggers activate based on a semantic shift that scales linearly with the input norm. This is a fundamentally different mechanism than the backdoors found in text-only models. Identifying these visual vulnerabilities requires a new set of mathematical tools focused on the projector interface.

From the abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable success in cross-modal understanding and generation, yet their deployment is threatened by critical safety vulnerabilities. While prior works have demonstrated the feasibility of backdoors in MLLMs via fine-tuning data poisoning to manipulate inference, the underlying mechanisms of backdoor attacks remain opaque, complicating the understanding and mitigation. To bridge this gap, we propose ProjLens, an interpretability framework d

Read the original paper →

← Back to today's papers