Cross-model disagreement (CMP/CME) provides a highly effective, label-free signal for detecting confident hallucinations.
March 27, 2026
Original Paper
Cross-Model Disagreement as a Label-Free Correctness Signal
arXiv · 2603.25450
The Takeaway
Detecting errors when a model is 'confidently wrong' usually requires expensive external labels or consensus voting. This paper shows that simply measuring a second model's 'surprise' at the primary model's output is a robust, training-free way to monitor deployment reliability in real-time.
From the abstract
Detecting when a language model is wrong without ground truth labels is a fundamental challenge for safe deployment. Existing approaches rely on a model's own uncertainty -- such as token entropy or confidence scores -- but these signals fail critically on the most dangerous failure mode: confident errors, where a model is wrong but certain. In this work we introduce cross-model disagreement as a correctness indicator -- a simple, training-free signal that can be dropped into existing production