AI & ML Efficiency Breakthrough

Pruning low-utility prompts before RL rollouts allows for 10x more efficient training of large reasoning models.

March 27, 2026

Original Paper

Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model

Jiahao Wu, Ning Lu, Shengcai Liu, Kun Wang, Yanting Yang, Li Qing, Ke Tang

arXiv · 2603.25184

The Takeaway

Reinforcement learning for reasoning (e.g., GRPO) is bottlenecked by expensive rollouts. By identifying the 'learning edge'—prompts with high uncertainty and intermediate difficulty—HIVE significantly reduces compute costs without sacrificing model performance.

From the abstract

Reinforcement learning (RL) has become essential for post-training large language models (LLMs) in reasoning tasks. While scaling rollouts can stabilize training and enhance performance, the computational overhead is a critical issue. In algorithms like GRPO, multiple rollouts per prompt incur prohibitive costs, as a large portion of prompts provide negligible gradients and are thus of low utility. To address this problem, we investigate how to select high-utility prompts before the rollout phas