AI & ML Efficiency Breakthrough

Achieves 6x compute reduction in Multimodal LLMs while actually improving accuracy by 2%.

March 27, 2026

Original Paper

ReDiPrune: Relevance-Diversity Pre-Projection Token Pruning for Efficient Multimodal LLMs

An Yu, Ting Yu Tsai, Zhenfei Zhang, Weiheng Lu, Felix X.-F. Ye, Ming-Ching Chang

arXiv · 2603.24680

The Takeaway

Unlike previous methods that prune tokens after projection, ReDiPrune operates on the raw encoder outputs using a relevance-diversity rule, solving the common trade-off where efficiency usually costs performance.

From the abstract

Recent multimodal large language models are computationally expensive because Transformers must process a large number of visual tokens. We present \textbf{ReDiPrune}, a training-free token pruning method applied before the vision-language projector, where visual features remain rich and discriminative. Unlike post-projection pruning methods that operate on compressed representations, ReDiPrune selects informative tokens directly from vision encoder outputs, preserving fine-grained spatial and s