GIFT is a training-free frame selection framework that uses 'Directed Diversity' to boost Video-LLM performance by up to 12.5%.
March 27, 2026
Original Paper
GIFT: Global Irreplaceability Frame Targeting for Efficient Video Understanding
arXiv · 2603.25072
The Takeaway
Practitioners can significantly reduce the computational cost of long-form video understanding without retraining models. It moves beyond greedy frame selection by assessing the intrinsic 'irreplaceability' of frames relative to the budget.
From the abstract
Video Large Language Models (VLMs) have achieved remarkable success in video understanding, but the significant computational cost from processing dense frames severely limits their practical application. Existing methods alleviate this by selecting keyframes, but their greedy decision-making, combined with a decoupled evaluation of relevance and diversity, often falls into local optima and results in erroneously selecting irrelevant noise frames. To address these challenges, we propose GIFT: Gl