Large language models forget "do not" instructions much faster than positive commands as a conversation grows longer.
April 24, 2026
Original Paper
Omission Constraints Decay While Commission Constraints Persist in Long-Context LLM Agents
arXiv · 2604.20911
The Takeaway
Negative constraints like "never share private data" are treated differently by the internal attention mechanisms of an agent. While commission constraints to perform specific actions stay stable, omission constraints decay rapidly as the context window fills up. This creates a hidden security gap where a model might follow a new instruction while completely ignoring the safety rules it was given at the start. Developers previously assumed that safety filters were permanent fixtures of a session, but this proves they are actually temporary. Systems relying on long-form dialogue are now vulnerable to safety lapses the longer a user interacts with them. Reliability in long context requires a total rethink of how we prompt for safety.
From the abstract
LLM agents deployed in production operate under operator-defined behavioral policies (system-prompt instructions such as prohibitions on credential disclosure, data exfiltration, and unauthorized output) that safety evaluations assume hold throughout a conversation. Prohibition-type constraints decay under context pressure while requirement-type constraints persist; we term this asymmetry Security-Recall Divergence (SRD). In a 4,416-trial three-arm causal study across 12 models and 8 providers a