MemDLM embeds a simulated denoising process into training to create 'Parametric Memory,' narrowing the train-inference gap for Diffusion Language Models.
March 24, 2026
Original Paper
MemDLM: Memory-Enhanced DLM Training
arXiv · 2603.22241
The Takeaway
It enables diffusion models to act as emergent in-weight retrieval mechanisms. This significantly improves performance on long-context tasks and 'Needle-in-a-Haystack' retrieval, making DLMs a more viable alternative to Transformers for complex reasoning.
From the abstract
Diffusion Language Models (DLMs) offer attractive advantages over Auto-Regressive (AR) models, such as full-attention parallel decoding and flexible generation. However, they suffer from a notable train-inference mismatch: DLMs are trained with a static, single-step masked prediction objective, but deployed through a multi-step progressive denoising trajectory. We propose MemDLM (Memory-Enhanced DLM), which narrows this gap by embedding a simulated denoising process into training via Bi-level Op