AI & ML Breaks Assumption

Cross-model disagreement (CMP/CME) provides a highly effective, label-free signal for detecting confident hallucinations.

March 27, 2026

Original Paper

Cross-Model Disagreement as a Label-Free Correctness Signal

Matt Gorbett, Suman Jana

arXiv · 2603.25450

The Takeaway

Detecting errors when a model is 'confidently wrong' usually requires expensive external labels or consensus voting. This paper shows that simply measuring a second model's 'surprise' at the primary model's output is a robust, training-free way to monitor deployment reliability in real-time.

From the abstract

Detecting when a language model is wrong without ground truth labels is a fundamental challenge for safe deployment. Existing approaches rely on a model's own uncertainty -- such as token entropy or confidence scores -- but these signals fail critically on the most dangerous failure mode: confident errors, where a model is wrong but certain. In this work we introduce cross-model disagreement as a correctness indicator -- a simple, training-free signal that can be dropped into existing production