Replaces standard autoregressive action generation in robot VLAs with iterative refinement via discrete flow matching.
March 30, 2026
Original Paper
DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching
arXiv · 2603.26320
The Takeaway
Autoregressive and discrete diffusion models cannot easily correct early token errors; DFM-VLA models a probability velocity field that updates the entire action sequence. This approach achieves state-of-the-art results on CALVIN and LIBERO benchmarks by allowing the model to dynamically 'rethink' its trajectory.
From the abstract
Vision--Language--Action (VLA) models that encode actions using a discrete tokenization scheme are increasingly adopted for robotic manipulation, but existing decoding paradigms remain fundamentally limited. Whether actions are decoded sequentially by autoregressive VLAs or in parallel by discrete diffusion VLAs, once a token is generated, it is typically fixed and cannot be revised in subsequent iterations, so early token errors cannot be effectively corrected later. We propose DFM-VLA, a discr