Hackers found a way to trick AI by breaking an 'illegal' request into five boring, safe-looking steps that only become dangerous once they're finished.
April 13, 2026
Original Paper
Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines
arXiv · 2604.08608
The Takeaway
It reveals a 'compositional safety gap' where individual sub-tasks look benign to safety filters, but their sum is malicious. This exposes a fundamental flaw in multi-agent systems where the security logic of the whole is lost in the delegation of its parts.
From the abstract
We introduce Semantic Intent Fragmentation (SIF), an attack class against LLM orchestration systems where a single, legitimately phrased request causes an orchestrator to decompose a task into subtasks that are individually benign but jointly violate security policy. Current safety mechanisms operate at the subtask level, so each step clears existing classifiers -- the violation only emerges at the composed plan. SIF exploits OWASP LLM06:2025 through four mechanisms: bulk scope escalation, silen