Introduces a reward framework that reduces LLM reasoning verbosity by optimizing for 'Information Density' via entropy reduction per step.
March 19, 2026
Original Paper
InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning
arXiv · 2603.17310
The Takeaway
Addresses the high computational cost of 'thinking' models (like o1) by penalizing redundant reasoning traces without sacrificing accuracy. It provides a principled way to train models that are both smart and concise.
From the abstract
Large Language Models (LLMs) with extended reasoning capabilities often generate verbose and redundant reasoning traces, incurring unnecessary computational cost. While existing reinforcement learning approaches address this by optimizing final response length, they neglect the quality of intermediate reasoning steps, leaving models vulnerable to reward hacking. We argue that verbosity is not merely a length problem, but a symptom of poor intermediate reasoning quality. To investigate this, we c