AI & ML New Capability

Enables zero-shot humanoid navigation in unseen environments using only 5 hours of human walking data and no robot-specific data.

April 2, 2026

Original Paper

Learning Humanoid Navigation from Human Data

Weizhuo Wang, Yanjie Ze, C. Karen Liu, Monroe Kennedy III

arXiv · 2604.00416

The Takeaway

EgoNav demonstrates that high-level behaviors (waiting for doors, avoiding glass) can emerge purely from human video-trajectory data when processed through a diffusion prior and a frozen DINOv3 backbone.

From the abstract

We present EgoNav, a system that enables a humanoid robot to traverse diverse, unseen environments by learning entirely from 5 hours of human walking data, with no robot data or finetuning. A diffusion model predicts distributions of plausible future trajectories conditioned on past trajectory, a 360 deg visual memory fusing color, depth, and semantics, and video features from a frozen DINOv3 backbone that capture appearance cues invisible to depth sensors. A hybrid sampling scheme achieves real