A training-free system for 3D scene reconstruction and editing from sparse RGB images using 3D-aware diffusion models to fill geometric gaps.
March 24, 2026
Original Paper
Training-Free Instance-Aware 3D Scene Reconstruction and Diffusion-Based View Synthesis from Sparse Images
arXiv · 2603.21166
The Takeaway
It eliminates the need for dense views or scene-specific optimization (like NeRF/Gaussians). Practitioners can generate and edit consistent 3D scenes (e.g., removing objects) from just a few unposed photos using a modular pipeline.
From the abstract
We introduce a novel, training-free system for reconstructing, understanding, and rendering 3D indoor scenes from a sparse set of unposed RGB images. Unlike traditional radiance field approaches that require dense views and per-scene optimization, our pipeline achieves high-fidelity results without any training or pose preprocessing. The system integrates three key innovations: (1) A robust point cloud reconstruction module that filters unreliable geometry using a warping-based anomaly removal s