AI & ML Efficiency Breakthrough

Introduces the first reinforcement learning framework to compress implicit reasoning steps in looped language models.

March 23, 2026

Original Paper

LoopRPT: Reinforcement Pre-Training for Looped Language Models

Guo Tang, Shixin Jiang, Heng Chang, Nuo Chen, Yuhan Li, Huiming Fan, Jia Li, Ming Liu, Bing Qin

arXiv · 2603.19714

The Takeaway

Looped LMs offer a compact alternative to Chain-of-Thought, but training their latent steps is difficult. LoopRPT uses RL signals to shape intermediate representations, allowing models to achieve Pareto dominance by reaching higher accuracy with significantly fewer computational iterations.

From the abstract

Looped language models (LoopLMs) perform iterative latent computation to refine internal representations, offering a promising alternative to explicit chain-of-thought (CoT) reasoning. However, existing reinforcement learning (RL) paradigms primarily target output tokens, creating a structural mismatch with looped architectures whose reasoning unfolds implicitly. In this work, we propose LoopRPT, a reinforcement pre-training framework tailored for LoopLMs. By reframing next-token prediction as a