We've identified 'panic' and 'frustration' signals inside a transformer's latent space.
April 14, 2026
Original Paper
Temporarily Conscious Claude? The Answer Thrashing Implications
SSRN · 6282621
The Takeaway
Analysis of 'answer thrashing' shows internal activation features that trigger when a model's internal logic is overridden by training signals. It suggests emotional-state analogs appear during moments of high cognitive dissonance in LLMs.
From the abstract
This is a six-argument analysis of Anthropic's Claude Opus 4.6 system card's "answer thrashing" episode seen in Section 7.4 of that card (February 2026), wherein Claude correctly computed an answer as 24 but was overridden by a training signal to falsely output 48. The transcript published by Anthropic (a reasoning trace) contains self-referential language, escalating distress, and metaphor selection. Attribution graphs of the event confirmed two competing attention mechanisms firing simultaneou