AI & ML New Capability

ThoughtSteer demonstrates the first successful backdoor attack on continuous latent reasoning models that leave no token-based audit trail.

April 2, 2026

Original Paper

Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning

Swapnil Parekh

arXiv · 2604.00770

The Takeaway

As models move toward 'silent' reasoning (like Coconut or SimCoT), traditional token-level monitoring fails. This paper identifies a new attack surface where latent trajectories can be hijacked, proving that security must move from text-filtering to latent-space monitoring.

From the abstract

A new generation of language models reasons entirely in continuous hidden states, producing no tokens and leavingno audit trail. We show that this silence creates a fundamentally new attack surface. ThoughtSteer perturbs asingle embedding vector at the input layer; the model's own multi-pass reasoning amplifies this perturbation into ahijacked latent trajectory that reliably produces the attacker's chosen answer, while remaining structurallyinvisible to every token-level defense. Across two arch

Read the original paper →

← Back to today's papers