[2603.25450] Cross-Model Disagreement as a Label-Free Correctness Signal
About this article
Abstract page for arXiv paper 2603.25450: Cross-Model Disagreement as a Label-Free Correctness Signal
Computer Science > Artificial Intelligence arXiv:2603.25450 (cs) [Submitted on 26 Mar 2026] Title:Cross-Model Disagreement as a Label-Free Correctness Signal Authors:Matt Gorbett, Suman Jana View a PDF of the paper titled Cross-Model Disagreement as a Label-Free Correctness Signal, by Matt Gorbett and 1 other authors View PDF HTML (experimental) Abstract:Detecting when a language model is wrong without ground truth labels is a fundamental challenge for safe deployment. Existing approaches rely on a model's own uncertainty -- such as token entropy or confidence scores -- but these signals fail critically on the most dangerous failure mode: confident errors, where a model is wrong but certain. In this work we introduce cross-model disagreement as a correctness indicator -- a simple, training-free signal that can be dropped into existing production systems, routing pipelines, and deployment monitoring infrastructure without modification. Given a model's generated answer, cross-model disagreement computes how surprised or uncertain a second verifier model is when reading that answer via a single forward pass. No generation from the verifying model is required, and no correctness labels are needed. We instantiate this principle as Cross-Model Perplexity (CMP), which measures the verifying model's surprise at the generating model's answer tokens, and Cross-Model Entropy (CME), which measures the verifying model's uncertainty at those positions. Both CMP and CME outperform within...