Defines 'Reasoning Safety' as a new security dimension and introduces a real-time monitor to detect logic-chain hijackings.
March 27, 2026
Original Paper
Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models
arXiv · 2603.25412
The Takeaway
As LLMs move toward long-horizon Chain-of-Thought reasoning, existing output-based safety filters are insufficient. This paper provides a taxonomy and a parallel monitoring framework to catch adversarial manipulation of the reasoning process itself before a final answer is generated.
From the abstract
Large language models (LLMs) increasingly rely on explicit chain-of-thought (CoT) reasoning to solve complex tasks, yet the safety of the reasoning process itself remains largely unaddressed. Existing work on LLM safety focuses on content safety--detecting harmful, biased, or factually incorrect outputs -- and treats the reasoning chain as an opaque intermediate artifact. We identify reasoning safety as an orthogonal and equally critical security dimension: the requirement that a model's reasoni