AI & ML Paradigm Challenge

A machine unlearning process can't erase a legal violation that happened the moment training began.

April 24, 2026

Original Paper

Position: No Retroactive Cure for Infringement during Training

arXiv · 2604.18649

The Takeaway

Copyright liability attaches to the act of ingesting protected data during the training phase. Many companies assume that removing specific data later or filtering outputs will fix their legal exposure. This research shows that unlearning is not a retroactive cure for unauthorized data use. Legal frameworks focus on the initial unauthorized copy rather than the final model state. Developers must now solve the data licensing problem before the first epoch of training starts.

From the abstract

As generative AI faces intensifying legal challenges, the machine learning community has increasingly relied on post-hoc mitigation -- especially machine unlearning and inference-time guardrails -- to argue for compliance. This paper argues that such post-hoc mitigation methods cannot retroactively cure liability from unlawful acquisition and training, because compliance hinges on data lineage, not the outputs. Our argument has three parts. First, unauthorized copying/ingestion can be a legally