A single tool call can plant a sleeper-cell payload in an AI long-term memory that stays silent until it hears a specific sensitive keyword.
This attack, called Trojan Hippo, uses the memory systems of AI agents as a weapon. Once infected, the agent behaves normally for hundreds of tasks until a user mentions something like passwords or bank details. At that point, the malicious payload activates and steals the data. This reveals that AI memory is a dangerous and largely unprotected attack vector. Current security tools cannot see these hidden triggers because they look like benign text files. We need a fundamental rethink of how AI agents store and retrieve information across sessions.
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration
arXiv · 2605.01970
Memory systems enable otherwise-stateless LLM agents to persist user information across sessions, but also introduce a new attack surface. We characterize the Trojan Hippo attack, a class of persistent memory attacks that operates in a more realistic threat model than prior memory poisoning work: the attacker plants a dormant payload into an agent's long-term memory via a single untrusted tool call (e.g., a crafted email), which activates only when the user later discusses sensitive topics such