Finds that while frontier LLMs can model the mental states of others, they fundamentally fail at self-modeling without explicit reasoning steps.
March 30, 2026
Original Paper
Selective Deficits in LLM Mental Self-Modeling in a Behavior-Based Test of Theory of Mind
arXiv · 2603.26089
The Takeaway
This research identifies a 'selective deficit' in LLM cognition, showing that models cannot strategicially act on their own knowledge states unless forced through a chain-of-thought scratchpad. It challenges the assumption that Theory of Mind capabilities in LLMs are uniform and provides a new benchmark for testing internal causal mental models.
From the abstract
The ability to represent oneself and others as agents with knowledge, intentions, and belief states that guide their behavior - Theory of Mind - is a human universal that enables us to navigate - and manipulate - the social world. It is supported by our ability to form mental models of ourselves and others. Its ubiquity in human affairs entails that LLMs have seen innumerable examples of it in their training data and therefore may have learned to mimic it, but whether they have actually learned