Quantifies near-verbatim data extraction risk in LLMs at 1/5000th the computational cost of standard Monte Carlo methods.
March 27, 2026
Original Paper
Estimating near-verbatim extraction risk in language models with decoding-constrained beam search
arXiv · 2603.24917
The Takeaway
The decoding-constrained beam search provides a deterministic lower bound on how likely a model is to leak training data. This provides a practical, efficient tool for organizations to audit models for privacy and copyright compliance before deployment.
From the abstract
Recent work shows that standard greedy-decoding extraction methods for quantifying memorization in LLMs miss how extraction risk varies across sequences. Probabilistic extraction -- computing the probability of generating a target suffix given a prefix under a decoding scheme -- addresses this, but is tractable only for verbatim memorization, missing near-verbatim instances that pose similar privacy and copyright risks. Quantifying near-verbatim extraction risk is expensive: the set of near-verb