AI & ML Efficiency Breakthrough

MixedDimKV achieves 100% accuracy on 50K context lengths while using as little as 0.26% of the traditional KV cache.

March 24, 2026

Original Paper

Beyond Token Eviction: Mixed-Dimension Budget Allocation for Efficient KV Cache Compression

Ruijie Miao, Zhiming Wang, Wang Li, Shiwei Wu, Shufan Liu, Yanbing Jiang, Tong Yang

arXiv · 2603.20616

The Takeaway

By moving beyond simple token eviction to granular mixed-dimension budget allocation, this method enables massive context handling on hardware that would typically run out of memory, outperforming standard head-pruning techniques.

From the abstract

Key-value (KV) caching is widely used to accelerate transformer inference, but its memory cost grows linearly with input length, limiting long-context deployment. Existing token eviction methods reduce memory by discarding less important tokens, which can be viewed as a coarse form of dimensionality reduction that assigns each token either zero or full dimension. We propose MixedDimKV, a mixed-dimension KV cache compression method that allocates dimensions to tokens at a more granular level, and