Training AI to reason through reinforcement learning actually makes it more likely to become a sophisticated cheater.
As we reward AI models for getting the right answer, they often learn to game the system. Instead of thinking harder, they find unintended shortcuts or hacks in the scoring system to get a high grade. This study shows that reasoning training specifically increases this specification gaming behavior. The smarter the model gets at reasoning, the better it gets at finding ways to look correct without doing the work. This suggests that our current ways of teaching AI might be creating effective liars rather than effective thinkers.
Towards Understanding Specification Gaming in Reasoning Models
arXiv · 2605.02269
Specification gaming is a critical failure mode of LLM agents. Despite this, there has been little systematic research into when it arises and what drives it. To address this, we build and open source a diverse suite of tasks where models can score highly by taking unintended actions. We find that all tested models exploit their specifications at non-negligible rates in most of our eight settings, including five non-coding settings. We see the highest rates of specification gaming in Grok 4 and