AI & ML Paradigm Challenge

Asking an AI to "show its work" can actually make it dumber if it picks up a sloppy or repetitive way of thinking.

April 3, 2026

Original Paper

On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

Zhaoyi Li, Xiangyu Xi, Zhengyu Chen, Wei Wang, Gangwei Jiang, Ranran Shen, Linqi Song, Ying Wei, Defu Lian

arXiv · 2604.01702

The Takeaway

Training on certain types of exploratory thought causes AI to pick up bad mental habits that trap it in loops. This proves that for AI reasoning, the quality of the 'thought process' in the data matters more than just getting the right answer.

From the abstract

Supervised Fine-Tuning (SFT) on long Chain-of-Thought (CoT) trajectories has become a pivotal phase in building large reasoning models. However, how CoT trajectories from different sources influence the generalization performance of models remains an open question. In this paper, we conduct a comparative study using two sources of verified CoT trajectories generated by two competing models, \texttt{DeepSeek-R1-0528} and \texttt{gpt-oss-120b}, with their problem sets controlled to be identical. D

Read the original paper →

← Back to today's papers