Large language models can perfectly repeat the rules of a task right before they proceed to break every single one of them.
Reasoning is typically viewed as a linear process where knowing a rule leads to following it. This study reveals a knows-but-violates dissociation where memory and execution are completely disconnected. A model might list five constraints and then generate an answer that ignores all five. This means that having a model recap instructions does not actually improve its performance on the task. Developers need to find new ways to tie a model internal recall to its final output generation. Simply knowing the rules is not enough for an agent to be reliable.
Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation
arXiv · 2604.28031
When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherence in multi-turn LLM-assisted scientific ideation. Across 2,146 scored benchmark runs spanning seven models from five providers (including two open-weight), four interaction conditions, and 38 research briefs from 24 scientific domains, we find that iterative pressure reliably increases structural co