StreamingVLA eliminates execution halting in robots by asynchronously parallelizing observation, generation, and execution.
March 31, 2026
Original Paper
StreamingVLA: Streaming Vision-Language-Action Model with Action Flow Matching and Adaptive Early Observation
arXiv · 2603.28565
The Takeaway
It achieves a 2.4x latency speedup and 6.5x reduction in halting using action flow matching and adaptive early observation, making VLA models viable for high-frequency, fluid real-world control.
From the abstract
Vision-language-action (VLA) models have demonstrated exceptional performance in natural language-driven perception and control. However, the high computational cost of VLA models poses significant efficiency challenges, particularly for resource-constrained edge platforms in real-world deployments. However, since different stages of VLA (observation, action generation and execution) must proceed sequentially, and wait for the completion of the preceding stage, the system suffers from frequent h