AI & ML Practical Magic

Dangerous models trained on illegal content can be caught without ever generating a single harmful image.

April 29, 2026

Original Paper

Evaluation without Generation: Non-Generative Assessment of Harmful Model Specialization with Applications to CSAM

Vinith M. Suriyakumar, Ayush Sekhari, Lena Stempfle, Robertson Wang, Michael Simpson, Rebecca Portnoff, Marzyeh Ghassemi, Ashia C. Wilson

arXiv · 2604.25119

The Takeaway

Detecting if a model has been fine-tuned on harmful data usually requires trying to force the model to produce that data. Gaussian probing solves this ethical dilemma by analyzing the internal representations of the model instead. The method identifies specific shifts in the latent space that occur when a model is specialized for harmful tasks. This allows authorities to prove a model is dangerous while complying with laws that prohibit the possession or generation of the material itself. It provides a non-generative path for model auditing and safety enforcement in highly sensitive domains.

From the abstract

Auditing the fine-tunes of open-weight generative models for harmful specialization has become a new governance challenge for model hosting platforms. The standard toolkit, generative evaluation via curated prompts or red-teaming, does not scale to platform-level auditing and breaks down entirely for domains like CSAM where generation is legally constrained. This motivates the Evaluation without Generation problem: assessing model capabilities without producing outputs. We argue that in such set

Read the original paper →

← Back to today's papers