AI & ML Scaling Insight

Challenges the assumption that 'background' pixels are useless in GUI agents and identifies a 'recency effect' for optimal token pruning.

March 30, 2026

Original Paper

Rethinking Token Pruning for Historical Screenshots in GUI Visual Agents: Semantic, Spatial, and Temporal Perspectives

Daiqiang Li, Zihao Pan, Zeyu Zhang, Ronghao Chen, Huacan Wang, Honggang Chen, Haiyun Jiang

arXiv · 2603.26041

The Takeaway

The paper finds that background regions in GUI screenshots are critical for detecting state transitions (e.g., buttons being pressed). It offers a blueprint for 'Historical Screenshot' management, showing that agents perform better when token budget is allocated heavily toward recent frames while retaining highly-compressed background cues.

From the abstract

In recent years, GUI visual agents built upon Multimodal Large Language Models (MLLMs) have demonstrated strong potential in navigation tasks. However, high-resolution GUI screenshots produce a large number of visual tokens, making the direct preservation of complete historical information computationally expensive. In this paper, we conduct an empirical study on token pruning for historical screenshots in GUI scenarios and distill three practical insights that are crucial for designing effectiv