Cortical Policy introduces a dual-stream view transformer inspired by the human brain's dorsal and ventral pathways to solve complex robotic manipulation.
March 24, 2026
Original Paper
Cortical Policy: A Dual-Stream View Transformer for Robotic Manipulation
arXiv · 2603.21051
The Takeaway
By integrating static 3D foundation model features with dynamic egocentric gaze estimation, this architecture significantly outperforms previous SOTA on the COLOSSEUM benchmark. It provides a new template for multi-view robotic perception that handles spatial complexity and dynamic changes simultaneously.
From the abstract
View transformers process multi-view observations to predict actions and have shown impressive performance in robotic manipulation. Existing methods typically extract static visual representations in a view-specific manner, leading to inadequate 3D spatial reasoning ability and a lack of dynamic adaptation. Taking inspiration from how the human brain integrates static and dynamic views to address these challenges, we propose Cortical Policy, a novel dual-stream view transformer for robotic manip