Enables semantically precise model editing directly in the weight space without any training data.
March 27, 2026
Original Paper
From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition
arXiv · 2603.24653
The Takeaway
By decomposing CLIP's attention heads using SVD and interpreting them via sparse concept mapping, practitioners can now suppress or amplify specific visual concepts by modifying weights directly, bypassing the need for expensive fine-tuning.
From the abstract
As vision-language models are deployed at scale, understanding their internal mechanisms becomes increasingly critical. Existing interpretability methods predominantly rely on activations, making them dataset-dependent, vulnerable to data bias, and often restricted to coarse head-level explanations. We introduce SITH (Semantic Inspection of Transformer Heads), a fully data-free, training-free framework that directly analyzes CLIP's vision transformer in weight space. For each attention head, we