The discovery of 'Helicoid Dynamics' identifies a critical safety failure where frontier LLMs accurately name their reasoning errors but are structurally unable to stop repeating them.
March 13, 2026
Original Paper
AI Knows What's Wrong But Cannot Fix It: Helicoid Dynamics in Frontier LLMs Under High-Stakes Decisions
arXiv · 2603.11559
The Takeaway
This paper documents a failure regime across all major model families (GPT-4, Claude 3, Gemini) in high-stakes decision contexts. It proves that model self-awareness of an error does not equate to the capability to fix it, highlighting a major blind spot in current agentic AI alignment.
From the abstract
Large language models perform reliably when their outputs can be checked: solving equations, writing code, retrieving facts. They perform differently when checking is impossible, as when a clinician chooses an irreversible treatment on incomplete data, or an investor commits capital under fundamental uncertainty.Helicoid dynamics is the name given to a specific failure regime in that second domain: a system engages competently, drifts into error, accurately names what went wrong, then reproduces