AI & ML Breaks Assumption

Safety alignment does not have to be a 'tax' on performance; it can actually improve mathematical reasoning accuracy.

March 27, 2026

Original Paper

SafeMath: Inference-time Safety improves Math Accuracy

Sagnik Basu, Subhrajit Mitra, Aman Juneja, Somnath Banerjee, Rima Hazra, Animesh Mukherjee

arXiv · 2603.25201

The Takeaway

Contrary to the belief that safety guardrails degrade model capabilities, SafeMath demonstrates that disentangling linguistic harm from mathematical logic allows models to reason more clearly, leading to improved accuracy on arithmetic tasks while maintaining safety.

From the abstract

Recent research points toward LLMs being manipulated through adversarial and seemingly benign inputs, resulting in harmful, biased, or policy-violating outputs. In this paper, we study an underexplored issue concerning harmful and toxic mathematical word problems. We show that math questions, particularly those framed as natural language narratives, can serve as a subtle medium for propagating biased, unethical, or psychologically harmful content, with heightened risks in educational settings in