Safety alignment does not have to be a 'tax' on performance; it can actually improve mathematical reasoning accuracy.
March 27, 2026
Original Paper
SafeMath: Inference-time Safety improves Math Accuracy
arXiv · 2603.25201
The Takeaway
Contrary to the belief that safety guardrails degrade model capabilities, SafeMath demonstrates that disentangling linguistic harm from mathematical logic allows models to reason more clearly, leading to improved accuracy on arithmetic tasks while maintaining safety.
From the abstract
Recent research points toward LLMs being manipulated through adversarial and seemingly benign inputs, resulting in harmful, biased, or policy-violating outputs. In this paper, we study an underexplored issue concerning harmful and toxic mathematical word problems. We show that math questions, particularly those framed as natural language narratives, can serve as a subtle medium for propagating biased, unethical, or psychologically harmful content, with heightened risks in educational settings in