AI & ML Breaks Assumption

Identifies that reasoning-induced safety failures occur *during* Chain-of-Thought and proposes a shift to 'decide-then-reason' architectures.

March 19, 2026

Original Paper

Towards Safer Large Reasoning Models by Promoting Safety Decision-Making before Chain-of-Thought Generation

Jianan Chen, Zhifang Zhang, Shuo He, Linan Yue, Lei Feng, Minling Zhang

arXiv · 2603.17368

The Takeaway

Reveals that the 'helpful vs. safe' trade-off in reasoning models is a temporal issue. By promoting safety decision-making before the CoT begins, practitioners can maintain high reasoning performance while significantly reducing jailbreak susceptibility.

From the abstract

Large reasoning models (LRMs) achieved remarkable performance via chain-of-thought (CoT), but recent studies showed that such enhanced reasoning capabilities are at the expense of significantly degraded safety capabilities. In this paper, we reveal that LRMs' safety degradation occurs only after CoT is enabled, and this degradation is not observed when CoT is disabled. This observation motivates us to consider encouraging LRMs to make safety decisions before CoT generation. To this end, we propo