Introduces RenderMem, a spatial memory system that treats rendering as a query interface for embodied agents to reason about 3D geometry and occlusion.
March 17, 2026
Original Paper
RenderMem: Rendering as Spatial Memory Retrieval
arXiv · 2603.14669
The Takeaway
Unlike static observation logs, it allows agents to 'see' the world from any future viewpoint implied by a query. This enables better reasoning for tasks involving line-of-sight or hidden object localization in dynamic environments.
From the abstract
Embodied reasoning is inherently viewpoint-dependent: what is visible, occluded, or reachable depends critically on where the agent stands. However, existing spatial memory systems for embodied agents typically store either multi-view observations or object-centric abstractions, making it difficult to perform reasoning with explicit geometric grounding. We introduce RenderMem, a spatial memory framework that treats rendering as the interface between 3D world representations and spatial reasoning