Achieves up to 400x speedup and 64x memory reduction for open-vocabulary 3D scene understanding compared to current Gaussian Splatting methods.
March 26, 2026
Original Paper
LightSplat: Fast and Memory-Efficient Open-Vocabulary 3D Scene Understanding in Five Seconds
arXiv · 2603.24146
The Takeaway
It eliminates the need for per-Gaussian feature optimization and iterative refinement, allowing for 3D language-driven segmentation in under five seconds. This makes real-time, language-guided interaction with 3D environments feasible on standard hardware.
From the abstract
Open-vocabulary 3D scene understanding enables users to segment novel objects in complex 3D environments through natural language. However, existing approaches remain slow, memory-intensive, and overly complex due to iterative optimization and dense per-Gaussian feature assignments. To address this, we propose LightSplat, a fast and memory-efficient training-free framework that injects compact 2-byte semantic indices into 3D representations from multi-view images. By assigning semantic indices o