Moves autonomous driving from 'predict-then-plan' to an interleaved VLA model where future frames and ego-actions are generated step-by-step.
March 31, 2026
Original Paper
Uni-World VLA: Interleaved World Modeling and Planning for Autonomous Driving
arXiv · 2603.27287
The Takeaway
This tight coupling prevents the common 'imagination drift' in world models by ensuring planning is continuously conditioned on an evolving future, enabling adaptive decisions in dynamic traffic that open-loop systems fail to handle.
From the abstract
Autonomous driving requires reasoning about how the environment evolves and planning actions accordingly. Existing world-model-based approaches typically predict future scenes first and plan afterwards, resulting in open-loop imagination that may drift from the actual decision process. In this paper, we present Uni-World VLA, a unified vision-language-action (VLA) model that tightly interleaves future frame prediction and trajectory planning. Instead of generating a full world rollout before pla