AI & ML Breaks Assumption

Finds that while frontier LLMs can model the mental states of others, they fundamentally fail at self-modeling without explicit reasoning steps.

March 30, 2026

Original Paper

Selective Deficits in LLM Mental Self-Modeling in a Behavior-Based Test of Theory of Mind

Christopher Ackerman

arXiv · 2603.26089

The Takeaway

This research identifies a 'selective deficit' in LLM cognition, showing that models cannot strategicially act on their own knowledge states unless forced through a chain-of-thought scratchpad. It challenges the assumption that Theory of Mind capabilities in LLMs are uniform and provides a new benchmark for testing internal causal mental models.

From the abstract

The ability to represent oneself and others as agents with knowledge, intentions, and belief states that guide their behavior - Theory of Mind - is a human universal that enables us to navigate - and manipulate - the social world. It is supported by our ability to form mental models of ourselves and others. Its ubiquity in human affairs entails that LLMs have seen innumerable examples of it in their training data and therefore may have learned to mimic it, but whether they have actually learned

Read the original paper →

← Back to today's papers