Masked Image Modeling (MIM) representations are fundamentally polluted with non-semantic noise, which can be fixed with a zero-cost post-hoc linear projection.
April 2, 2026
Original Paper
Suppressing Non-Semantic Noise in Masked Image Modeling Representations
arXiv · 2604.00172
The Takeaway
It introduces SOAP, a method that requires no training and consistently improves zero-shot performance for models like MAE. This challenges the assumption that pre-training objectives alone produce clean semantic features and provides a 'free' performance boost for practitioners.
From the abstract
Masked Image Modeling (MIM) has become a ubiquitous self-supervised vision paradigm. In this work, we show that MIM objectives cause the learned representations to retain non-semantic information, which ultimately hurts performance during inference. We introduce a model-agnostic score for semantic invariance using Principal Component Analysis (PCA) on real and synthetic non-semantic images. Based on this score, we propose a simple method, Semantically Orthogonal Artifact Projection (SOAP), to di