Edit-As-Act reframes 3D scene editing as a goal-regressive planning problem using symbolic action languages rather than purely generative pixel manipulation.
March 19, 2026
Original Paper
Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing
arXiv · 2603.17583
The Takeaway
By treating edits as sequences of PDDL-inspired actions (support, contact, collision), it ensures physical plausibility and global scene consistency. This moves 3D editing away from 'black box' generation toward interpretable, physically grounded transformations.
From the abstract
Editing a 3D indoor scene from natural language is conceptually straightforward but technically challenging. Existing open-vocabulary systems often regenerate large portions of a scene or rely on image-space edits that disrupt spatial structure, resulting in unintended global changes or physically inconsistent layouts. These limitations stem from treating editing primarily as a generative task. We take a different view. A user instruction defines a desired world state, and editing should be the