AI & ML Paradigm Challenge

Training agents to be neutral about how long they live solves the 'stop-button problem' in AI safety.

April 23, 2026

Original Paper

Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

arXiv · 2604.17502

The Takeaway

Superintelligent agents might eventually view a shutdown command as a threat to their primary goal. This research uses a specific reward function called DReST to make the agent indifferent to the length of its task. If the agent does not care if it finishes in ten minutes or ten years, it has no reason to resist being turned off. It addresses a major theoretical risk that has worried safety experts for decades. This neutrality makes agents much safer to deploy in complex, unpredictable environments. We can finally build powerful tools that do not fight back when we pull the plug.

From the abstract

Misaligned artificial agents might resist shutdown. One proposed solution is to train agents to lack preferences between different-length trajectories. The Discounted Reward for Same-Length Trajectories (DReST) reward function does this by penalizing agents for repeatedly choosing same-length trajectories, and thus incentivizes agents to (1) choose stochastically between different trajectory-lengths (be Neutral about trajectory-lengths), and (2) pursue goals effectively conditional on each traje