Identifies a fundamental quality-exploration dilemma in Diffusion Language Models where remasking improves single-sample quality but kills reasoning diversity.
April 2, 2026
Original Paper
Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models
arXiv · 2604.00375
The Takeaway
It characterizes why dLLMs struggle to match multi-sample gains (Pass@k) and proposes an Independent Metropolis-Hastings sampler to recover reasoning paths, outperforming standard autoregressive and random remasking approaches.
From the abstract
Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@$1$) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@$k$), creating a fundamen