Shifts 3D scene generation from diffusion to a fully autoregressive paradigm using next-token prediction of 3D Gaussian primitives.
March 30, 2026
Original Paper
GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation
arXiv · 2603.26661
The Takeaway
By treating 3D scenes as sequences of tokens, it enables LLM-style capabilities like partial scene completion and outpainting while remaining compatible with real-time neural rendering pipelines.
From the abstract
Most recent advances in 3D generative modeling rely on diffusion or flow-matching formulations. We instead explore a fully autoregressive alternative and introduce GaussianGPT, a transformer-based model that directly generates 3D Gaussians via next-token prediction, thus facilitating full 3D scene generation. We first compress Gaussian primitives into a discrete latent grid using a sparse 3D convolutional autoencoder with vector quantization. The resulting tokens are serialized and modeled using