AI & ML Breaks Assumption

Reduces visual tokens in robot policies by 78% by using inter-layer rank consistency instead of simple attention magnitude.

March 27, 2026

Original Paper

Beyond Attention Magnitude: Leveraging Inter-layer Rank Consistency for Efficient Vision-Language-Action Models

Peiju Liu, Jinming Liu, Xipeng Qiu, Xuanjing Huang

arXiv · 2603.24941

The Takeaway

TIES challenges the common belief that attention magnitude is the best proxy for token importance, showing that high-attention tokens can sometimes be noise. By selecting tokens based on how consistently they are ranked across layers, it significantly boosts robot inference efficiency and success rates.

From the abstract

Vision-Language-Action (VLA) models excel in robotic manipulation but suffer from significant inference latency due to processing dense visual tokens. Existing token reduction methods predominantly rely on attention magnitude as a static selection. In this work, we challenge this assumption, revealing that high-attention tokens are task-dependent and can even degrade policy performance. To address this, we introduce \textbf{TIES} (\textbf{T}au-guided \textbf{I}nter-layer \textbf{E}fficient \text