AI & ML Paradigm Shift

ImagiNav enables robots to learn navigation from diverse 'in-the-wild' internet videos by decoupling visual planning from physical actuation.

March 17, 2026

Original Paper

ImagiNav: Scalable Embodied Navigation via Generative Visual Prediction and Inverse Dynamics

Jie Chen, Yuxin Cai, Yizhuo Wang, Ruofei Bai, Yuhong Cao, Jun Li, Yau Wei Yun, Guillaume Sartoretti

arXiv · 2603.13833

The Takeaway

Moves beyond expensive, embodiment-specific robot demonstrations by using generative video models to 'imagine' trajectories. This allows zero-shot transfer of navigation skills from unlabeled human-centric video data to physical robots.

From the abstract

Enabling robots to navigate open-world environments via natural language is critical for general-purpose autonomy. Yet, Vision-Language Navigation has relied on end-to-end policies trained on expensive, embodiment-specific robot data. While recent foundation models trained on vast simulation data show promise, the challenge of scaling and generalizing due to the limited scene diversity and visual fidelity in simulation persists. To address this gap, we propose ImagiNav, a novel modular paradigm

Read the original paper →

← Back to today's papers