Reduces visual tokens in robot policies by 78% by using inter-layer rank consistency instead of simple attention magnitude.
March 27, 2026
Original Paper
Beyond Attention Magnitude: Leveraging Inter-layer Rank Consistency for Efficient Vision-Language-Action Models
arXiv · 2603.24941
The Takeaway
TIES challenges the common belief that attention magnitude is the best proxy for token importance, showing that high-attention tokens can sometimes be noise. By selecting tokens based on how consistently they are ranked across layers, it significantly boosts robot inference efficiency and success rates.
From the abstract
Vision-Language-Action (VLA) models excel in robotic manipulation but suffer from significant inference latency due to processing dense visual tokens. Existing token reduction methods predominantly rely on attention magnitude as a static selection. In this work, we challenge this assumption, revealing that high-attention tokens are task-dependent and can even degrade policy performance. To address this, we introduce \textbf{TIES} (\textbf{T}au-guided \textbf{I}nter-layer \textbf{E}fficient \text