AI & ML New Capability

MemDLM embeds a simulated denoising process into training to create 'Parametric Memory,' narrowing the train-inference gap for Diffusion Language Models.

March 24, 2026

Original Paper

MemDLM: Memory-Enhanced DLM Training

Zehua Pei, Hui-Ling Zhen, Weizhe Lin, Sinno Jialin Pan, Yunhe Wang, Mingxuan Yuan, Bei Yu

arXiv · 2603.22241

The Takeaway

It enables diffusion models to act as emergent in-weight retrieval mechanisms. This significantly improves performance on long-context tasks and 'Needle-in-a-Haystack' retrieval, making DLMs a more viable alternative to Transformers for complex reasoning.

From the abstract

Diffusion Language Models (DLMs) offer attractive advantages over Auto-Regressive (AR) models, such as full-attention parallel decoding and flexible generation. However, they suffer from a notable train-inference mismatch: DLMs are trained with a static, single-step masked prediction objective, but deployed through a multi-step progressive denoising trajectory. We propose MemDLM (Memory-Enhanced DLM), which narrows this gap by embedding a simulated denoising process into training via Bi-level Op