ImagiNav enables robots to learn navigation from diverse 'in-the-wild' internet videos by decoupling visual planning from physical actuation.
March 17, 2026
Original Paper
ImagiNav: Scalable Embodied Navigation via Generative Visual Prediction and Inverse Dynamics
arXiv · 2603.13833
The Takeaway
Moves beyond expensive, embodiment-specific robot demonstrations by using generative video models to 'imagine' trajectories. This allows zero-shot transfer of navigation skills from unlabeled human-centric video data to physical robots.
From the abstract
Enabling robots to navigate open-world environments via natural language is critical for general-purpose autonomy. Yet, Vision-Language Navigation has relied on end-to-end policies trained on expensive, embodiment-specific robot data. While recent foundation models trained on vast simulation data show promise, the challenge of scaling and generalizing due to the limited scene diversity and visual fidelity in simulation persists. To address this gap, we propose ImagiNav, a novel modular paradigm