Masked Diffusion Language Models (MDLMs) fail at reasoning because they unmask tokens in the wrong order, not because they lack internal logic.
March 31, 2026
Original Paper
LogicDiff: Logic-Guided Denoising Improves Reasoning in Masked Diffusion Language Models
arXiv · 2603.26771
The Takeaway
It shows that simply changing the inference-time unmasking sequence to follow logical dependencies (premises first, then connectives) boosts GSM8K accuracy from 22% to 60.7% without any retraining. This challenges the assumption that MDLMs need massive RL or parameter scaling to match autoregressive reasoning performance.
From the abstract
Masked diffusion language models (MDLMs) generate text by iteratively unmasking tokens from a fully masked sequence, offering parallel generation and bidirectional context. However, their standard confidence-based unmasking strategy systematically defers high-entropy logical connective tokens, the critical branching points in reasoning chains, leading to severely degraded reasoning performance. We introduce LogicDiff, an inference-time method that replaces confidence-based unmasking with logic-r