AI & ML Paradigm Shift

Mechanistic interpretability reveals that LLMs possess 'affect reception' circuits that detect emotional content even when explicit keywords are removed.

March 25, 2026

Original Paper

Whether, Not Which: Mechanistic Interpretability Reveals Dissociable Affect Reception and Emotion Categorization in LLMs

Michael Keeman

arXiv · 2603.22295

The Takeaway

This proves that models aren't just performing keyword matching for sentiment; they have internal representations for situational emotional cues. It validates the use of LLMs for nuanced clinical and psychological analysis where keywords are absent.

From the abstract

Large language models appear to develop internal representations of emotion -- "emotion circuits," "emotion neurons," and structured emotional manifolds have been reported across multiple model families. But every study making these claims uses stimuli signalled by explicit emotion keywords, leaving a fundamental question unanswered: do these circuits detect genuine emotional meaning, or do they detect the word "devastated"? We present the first clinical validity test of emotion circuit claims u

Read the original paper →

← Back to today's papers