Replaces fixed context compression ratios with a performance-floor constraint to ensure reliable LLM deployment.
March 23, 2026
Original Paper
PoC: Performance-oriented Context Compression for Large Language Models via Performance Prediction
arXiv · 2603.19733
The Takeaway
Context compression is often unpredictable, causing catastrophic performance drops at arbitrary ratios. By allowing users to specify an acceptable performance level (e.g., 90% accuracy), this framework uses a predictor to find the most aggressive safe compression ratio, making efficiency gains feasible for production-grade reliability.
From the abstract
While context compression can mitigate the growing inference costs of Large Language Models (LLMs) by shortening contexts, existing methods that specify a target compression ratio or length suffer from unpredictable performance degradation, hindering their reliable deployment. We introduce a paradigm shift to Performance-oriented Context Compression (PoC), where developers specify an acceptable performance floor instead of a compression ratio. PoC employs a lightweight performance predictor to a