Enables zero-shot humanoid navigation in unseen environments using only 5 hours of human walking data and no robot-specific data.
April 2, 2026
Original Paper
Learning Humanoid Navigation from Human Data
arXiv · 2604.00416
The Takeaway
EgoNav demonstrates that high-level behaviors (waiting for doors, avoiding glass) can emerge purely from human video-trajectory data when processed through a diffusion prior and a frozen DINOv3 backbone.
From the abstract
We present EgoNav, a system that enables a humanoid robot to traverse diverse, unseen environments by learning entirely from 5 hours of human walking data, with no robot data or finetuning. A diffusion model predicts distributions of plausible future trajectories conditioned on past trajectory, a 360 deg visual memory fusing color, depth, and semantics, and video features from a frozen DINOv3 backbone that capture appearance cues invisible to depth sensors. A hybrid sampling scheme achieves real