ThoughtSteer demonstrates the first successful backdoor attack on continuous latent reasoning models that leave no token-based audit trail.
April 2, 2026
Original Paper
Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning
arXiv · 2604.00770
The Takeaway
As models move toward 'silent' reasoning (like Coconut or SimCoT), traditional token-level monitoring fails. This paper identifies a new attack surface where latent trajectories can be hijacked, proving that security must move from text-filtering to latent-space monitoring.
From the abstract
A new generation of language models reasons entirely in continuous hidden states, producing no tokens and leavingno audit trail. We show that this silence creates a fundamentally new attack surface. ThoughtSteer perturbs asingle embedding vector at the input layer; the model's own multi-pass reasoning amplifies this perturbation into ahijacked latent trajectory that reliably produces the attacker's chosen answer, while remaining structurallyinvisible to every token-level defense. Across two arch