AI & ML Practical Magic

A multi-agent AI pipeline successfully found real-world security flaws in the ISO C++ standard that human experts missed for years.

April 23, 2026

Original Paper

Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery

arXiv · 2604.19049

The Takeaway

The Refute-or-Promote methodology turns hallucination-prone LLMs into high-precision security tools. By using adversarial kill mandates and empirical testing, the system filters out false positives and keeps only valid bug reports. This process resulted in 4 new CVEs and several accepted corrections to international standards. It proves that AI can be trusted with critical infrastructure if it is placed inside a rigorous, multi-agent adversarial loop. This architecture provides a template for using AI to harden the world's most important software systems.

From the abstract

LLM-assisted defect discovery has a precision crisis: plausible-but-wrong reports overwhelm maintainers and degrade credibility for real findings. We present Refute-or-Promote, an inference-time reliability pattern combining Stratified Context Hunting (SCH) for candidate generation, adversarial kill mandates, context asymmetry, and a Cross-Model Critic (CMC). Adversarial agents attempt to disprove candidates at each promotion gate; cold-start reviewers are intended to reduce anchoring cascades;