Concept erasure in text-to-image models is largely a facade that can be bypassed using text-free inversion attacks.
March 19, 2026
Original Paper
TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models
arXiv · 2603.17828
The Takeaway
This research demonstrates that current 'concept erasure' safety techniques only sever text-to-image mappings while leaving the underlying visual knowledge intact. This challenges the validity of existing model unlearning benchmarks and necessitates a move toward visual-centric erasure methods.
From the abstract
Although text-to-image diffusion models exhibit remarkable generative power, concept erasure techniques are essential for their safe deployment to prevent the creation of harmful content. This has fostered a dynamic interplay between the development of erasure defenses and the adversarial probes designed to bypass them, and this co-evolution has progressively enhanced the efficacy of erasure methods. However, this adversarial co-evolution has converged on a narrow, text-centric paradigm that equ