AI & ML New Capability

X-World is a controllable, action-conditioned multi-camera world model that simulates realistic future video observations for end-to-end driving.

March 23, 2026

Original Paper

X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving

Chaoda Zheng, Sean Li, Jinhao Deng, Zhennan Wang, Shijia Chen, Liqiang Xiao, Ziheng Chi, Hongbin Lin, Kangjie Chen, Boyang Wang, Yu Zhang, Xianming Liu

arXiv · 2603.19979

The Takeaway

It provides a bridge between pure generative models and robotics simulators, allowing developers to test driving policies in a 'real-world simulator' that respects commanded actions, road geometry, and temporal consistency across multiple camera views.

From the abstract

Scalable and reliable evaluation is increasingly critical in the end-to-end era of autonomous driving, where vision--language--action (VLA) policies directly map raw sensor streams to driving actions. Yet, current evaluation pipelines still rely heavily on real-world road testing, which is costly, biased toward limited scenario coverage, and difficult to reproduce. These challenges motivate a real-world simulator that can generate realistic future observations under proposed actions, while remai