AI & ML Breaks Assumption

Reveals that 'erasing' concepts from video diffusion models only suppresses output rather than removing the underlying representations.

March 24, 2026

Original Paper

PROBE: Diagnosing Residual Concept Capacity in Erased Text-to-Video Diffusion Models

Yiwei Xie, Zheng Zhang, Ping Liu

arXiv · 2603.21547

The Takeaway

The PROBE protocol proves that sensitive concepts (nudity, etc.) can be reactivated in 'safe' models via simple latent optimization. This highlights a fundamental flaw in current safety auditing and concept erasure techniques for T2V models.

From the abstract

Concept erasure techniques for text-to-video (T2V) diffusion models report substantial suppression of sensitive content, yet current evaluation is limited to checking whether the target concept is absent from generated frames, treating output-level suppression as evidence of representational removal. We introduce PROBE, a diagnostic protocol that quantifies the \textit{reactivation potential} of erased concepts in T2V models. With all model parameters frozen, PROBE optimizes a lightweight pseudo