AI & ML Practical Magic

Complex 3D environments that used to take minutes to understand can now be fully mapped in 0.14 seconds.

April 25, 2026

Original Paper

SpaCeFormer: Fast Proposal-Free Open-Vocabulary 3D Instance Segmentation

arXiv · 2604.20395

The Takeaway

Traditional 3D instance segmentation requires slow, multi-stage pipelines that are too sluggish for real-time use. This new proposal-free architecture achieves a speed increase of up to three orders of magnitude compared to previous industry standards. Most robotic systems have to pause or move slowly to process the spatial data around them. This breakthrough allows machines to identify and interact with individual objects at human-reflex speeds. It removes the primary barrier for the adoption of sophisticated augmented reality and autonomous home robots.

From the abstract

Open-vocabulary 3D instance segmentation is a core capability for robotics and AR/VR, but prior methods trade one bottleneck for another: multi-stage 2D+3D pipelines aggregate foundation-model outputs at hundreds of seconds per scene, while pseudo-labeled end-to-end approaches rely on fragmented masks and external region proposals. We present SpaCeFormer, a proposal-free space-curve transformer that runs at 0.14 seconds per scene, 2-3 orders of magnitude faster than multi-stage 2D+3D pipelines.