AI & ML Paradigm Shift

Replaces standard autoregressive action generation in robot VLAs with iterative refinement via discrete flow matching.

March 30, 2026

Original Paper

DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching

Jiayi Chen, Wenxuan Song, Shuai Chen, Jingbo Wang, Zhijun Li, Haoang Li

arXiv · 2603.26320

The Takeaway

Autoregressive and discrete diffusion models cannot easily correct early token errors; DFM-VLA models a probability velocity field that updates the entire action sequence. This approach achieves state-of-the-art results on CALVIN and LIBERO benchmarks by allowing the model to dynamically 'rethink' its trajectory.

From the abstract

Vision--Language--Action (VLA) models that encode actions using a discrete tokenization scheme are increasingly adopted for robotic manipulation, but existing decoding paradigms remain fundamentally limited. Whether actions are decoded sequentially by autoregressive VLAs or in parallel by discrete diffusion VLAs, once a token is generated, it is typically fixed and cannot be revised in subsequent iterations, so early token errors cannot be effectively corrected later. We propose DFM-VLA, a discr