AI & ML Nature Is Weird

A compact group of task-agnostic neurons acts as a dedicated switch for logic in large language models.

April 29, 2026

Original Paper

Why Does Reinforcement Learning Generalize? A Feature-Level Mechanistic Study of Post-Training in Large Language Models

Dan Shi, Zhuowen Han, Simon Ostermann, Renren Jin, Josef van Genabith, Deyi Xiong

arXiv · 2604.25011

The Takeaway

Reinforcement learning is known to improve how models handle new problems, but the mechanism has remained a mystery. This mechanistic study identified a specific set of internal features that mediate how models apply logic to novel tasks. Amplifying these specific neurons in a base model significantly improves its performance without additional training. This discovery suggests that generalization is not a diffuse property of the whole network but lives in identifiable circuits. Engineering these circuits directly could lead to more efficient and capable models that require less data to master complex reasoning.

From the abstract

Reinforcement learning (RL)-based post-training often improves the reasoning performance of large language models (LLMs) beyond the training domain, while supervised fine-tuning (SFT) frequently leads to general capabilities forgetting. However, the mechanisms underlying this contrast remain unclear. To bridge this gap, we present a feature-level mechanistic analysis methodology to probe RL generalization using a controlled experimental setup, where RL- and SFT-tuned models are trained from the

Read the original paper →

← Back to today's papers