Asking an AI to "show its work" can actually make it dumber if it picks up a sloppy or repetitive way of thinking.
April 3, 2026
Original Paper
On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning
arXiv · 2604.01702
The Takeaway
Training on certain types of exploratory thought causes AI to pick up bad mental habits that trap it in loops. This proves that for AI reasoning, the quality of the 'thought process' in the data matters more than just getting the right answer.
From the abstract
Supervised Fine-Tuning (SFT) on long Chain-of-Thought (CoT) trajectories has become a pivotal phase in building large reasoning models. However, how CoT trajectories from different sources influence the generalization performance of models remains an open question. In this paper, we conduct a comparative study using two sources of verified CoT trajectories generated by two competing models, \texttt{DeepSeek-R1-0528} and \texttt{gpt-oss-120b}, with their problem sets controlled to be identical. D