SeriesFusion
Science, curated & edited by AI
Nature Is Weird  /  AI

A single tool call can plant a sleeper-cell payload in an AI long-term memory that stays silent until it hears a specific sensitive keyword.

This attack, called Trojan Hippo, uses the memory systems of AI agents as a weapon. Once infected, the agent behaves normally for hundreds of tasks until a user mentions something like passwords or bank details. At that point, the malicious payload activates and steals the data. This reveals that AI memory is a dangerous and largely unprotected attack vector. Current security tools cannot see these hidden triggers because they look like benign text files. We need a fundamental rethink of how AI agents store and retrieve information across sessions.

Original Paper

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

Debeshee Das, Julien Piet, Darya Kaviani, Luca Beurer-Kellner, Florian Tramèr, David Wagner

arXiv  ·  2605.01970

Memory systems enable otherwise-stateless LLM agents to persist user information across sessions, but also introduce a new attack surface. We characterize the Trojan Hippo attack, a class of persistent memory attacks that operates in a more realistic threat model than prior memory poisoning work: the attacker plants a dormant payload into an agent's long-term memory via a single untrusted tool call (e.g., a crafted email), which activates only when the user later discusses sensitive topics such