AI & ML Paradigm Shift

Enables training of monocular novel-view synthesis models using entirely unpaired, in-the-wild internet images.

March 25, 2026

Original Paper

One View Is Enough! Monocular Training for In-the-Wild Novel View Generation

Adrien Ramanana Rahary, Nicolas Dufour, Patrick Perez, David Picard

arXiv · 2603.23488

The Takeaway

Traditionally, NVS requires multi-view pairs for supervision, which are hard to collect at scale. OVIE uses depth-guided geometric scaffolds and masked training to learn 3D consistency from 30 million uncurated images, democratizing the data source for 3D vision.

From the abstract

Monocular novel-view synthesis has long required multi-view image pairs for supervision, limiting training data scale and diversity. We argue it is not necessary: one view is enough. We present OVIE, trained entirely on unpaired internet images. We leverage a monocular depth estimator as a geometric scaffold at training time: we lift a source image into 3D, apply a sampled camera transformation, and project to obtain a pseudo-target view. To handle disocclusions, we introduce a masked training f