Nature Is Weird / AI

A single tool call can plant a sleeper-cell payload in an AI long-term memory that stays silent until it hears a specific sensitive keyword.

The Takeaway

This attack, called Trojan Hippo, uses the memory systems of AI agents as a weapon. Once infected, the agent behaves normally for hundreds of tasks until a user mentions something like passwords or bank details. At that point, the malicious payload activates and steals the data. This reveals that AI memory is a dangerous and largely unprotected attack vector. Current security tools cannot see these hidden triggers because they look like benign text files. We need a fundamental rethink of how AI agents store and retrieve information across sessions.

By SeriesFusion Editorial Board · May 5, 2026

Original Paper

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

Debeshee Das, Julien Piet, Darya Kaviani, Luca Beurer-Kellner, Florian Tramèr, David Wagner

arXiv · 2605.01970

From the abstract

Memory systems enable otherwise-stateless LLM agents to persist user information across sessions, but also introduce a new attack surface. We characterize the Trojan Hippo attack, a class of persistent memory attacks that operates in a more realistic threat model than prior memory poisoning work: the attacker plants a dormant payload into an agent's long-term memory via a single untrusted tool call (e.g., a crafted email), which activates only when the user later discusses sensitive topics such

Read the original paper →

← Back to today's papers