Introduces the 'near-miss' metric to detect latent failures in agentic workflows where agents bypass policy checks but reach correct outcomes by chance.
April 1, 2026
Original Paper
Near-Miss: Latent Policy Failure Detection in Agentic Workflows
arXiv · 2603.29665
The Takeaway
It reveals a major blind spot in current outcome-based evaluation, showing that 8-17% of apparently successful trajectories involve critical policy violations. This is essential for developing reliable business process automation.
From the abstract
Agentic systems for business process automation often require compliance with policies governing conditional updates to the system state. Evaluation of policy adherence in LLM-based agentic workflows is typically performed by comparing the final system state against a predefined ground truth. While this approach detects explicit policy violations, it may overlook a more subtle class of issues in which agents bypass required policy checks, yet reach a correct outcome due to favorable circumstance