AI & ML Efficiency Breakthrough

GIFT is a training-free frame selection framework that uses 'Directed Diversity' to boost Video-LLM performance by up to 12.5%.

March 27, 2026

Original Paper

GIFT: Global Irreplaceability Frame Targeting for Efficient Video Understanding

Junpeng Ma, Sashuai Zhou, Guanghao Li, Xin Gao, Yue Cao, Hengyu Zeng, Yuxiang Yan, Zhibin Wang, Jun Song, Bo Zheng, Shanghang Zhang, Jian Pu

arXiv · 2603.25072

The Takeaway

Practitioners can significantly reduce the computational cost of long-form video understanding without retraining models. It moves beyond greedy frame selection by assessing the intrinsic 'irreplaceability' of frames relative to the budget.

From the abstract

Video Large Language Models (VLMs) have achieved remarkable success in video understanding, but the significant computational cost from processing dense frames severely limits their practical application. Existing methods alleviate this by selecting keyframes, but their greedy decision-making, combined with a decoupled evaluation of relevance and diversity, often falls into local optima and results in erroneously selecting irrelevant noise frames. To address these challenges, we propose GIFT: Gl