AirVLA successfully transfers manipulation-trained Vision-Language-Action (VLA) models to underactuated aerial robots using a payload-aware guidance mechanism.
March 27, 2026
Original Paper
$π$, But Make It Fly: Physics-Guided Transfer of VLA Models to Aerial Manipulation
arXiv · 2603.25038
The Takeaway
It bridges the 'dynamics gap' between static robot arms and dynamic flight without retraining the foundation model. This allows general-purpose VLAs to be used in complex, high-dynamic environments like aerial pick-and-place, significantly expanding the utility of pre-trained robotics models.
From the abstract
Vision-Language-Action (VLA) models such as $\pi_0$ have demonstrated remarkable generalization across diverse fixed-base manipulators. However, transferring these foundation models to aerial platforms remains an open challenge due to the fundamental mismatch between the quasi-static dynamics of fixed-base arms and the underactuated, highly dynamic nature of flight. In this work, we introduce AirVLA, a system that investigates the transferability of manipulation-pretrained VLAs to aerial pick-an