AI & ML New Capability

Defines 'Reasoning Safety' as a new security dimension and introduces a real-time monitor to detect logic-chain hijackings.

March 27, 2026

Original Paper

Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models

Xunguang Wang, Yuguang Zhou, Qingyue Wang, Zongjie Li, Ruixuan Huang, Zhenlan Ji, Pingchuan Ma, Shuai Wang

arXiv · 2603.25412

The Takeaway

As LLMs move toward long-horizon Chain-of-Thought reasoning, existing output-based safety filters are insufficient. This paper provides a taxonomy and a parallel monitoring framework to catch adversarial manipulation of the reasoning process itself before a final answer is generated.

From the abstract

Large language models (LLMs) increasingly rely on explicit chain-of-thought (CoT) reasoning to solve complex tasks, yet the safety of the reasoning process itself remains largely unaddressed. Existing work on LLM safety focuses on content safety--detecting harmful, biased, or factually incorrect outputs -- and treats the reasoning chain as an opaque intermediate artifact. We identify reasoning safety as an orthogonal and equally critical security dimension: the requirement that a model's reasoni