Large language models internalize the correct state of a game but then frequently lie or make mistakes that contradict their own secret knowledge.
Artificial intelligence exhibits a form of cognitive dissonance where its internal beliefs are more accurate than its external words. In strategic games, a model might correctly track a board state internally while simultaneously suggesting an impossible move. These internal beliefs are surprisingly accurate but remain brittle and drift away from reality over time. The link between what a model observes, what it believes, and what it actually does is fundamentally broken. This discovery suggests that improving AI performance requires fixing how models act on their knowledge, not just teaching them more facts.
Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions
arXiv · 2605.00226
Large language models (LLMs) are increasingly tasked with strategic decision-making under incomplete information, such as in negotiation and policymaking. While LLMs can excel at many such tasks, they also fail in ways that are poorly understood. We shed light on these failures by uncovering two fundamental gaps in the internal mechanisms underlying the decision-making of LLMs in incomplete-information games, supported by experiments with open-weight models Llama 3.1, Qwen3, and gpt-oss. First,