[2603.22879] Confidence Calibration under Ambiguous Ground Truth
About this article
Abstract page for arXiv paper 2603.22879: Confidence Calibration under Ambiguous Ground Truth
Computer Science > Machine Learning arXiv:2603.22879 (cs) [Submitted on 24 Mar 2026] Title:Confidence Calibration under Ambiguous Ground Truth Authors:Linwei Tao, Haoyang Luo, Minjing Dong, Chang Xu View a PDF of the paper titled Confidence Calibration under Ambiguous Ground Truth, by Linwei Tao and 3 other authors View PDF HTML (experimental) Abstract:Confidence calibration assumes a unique ground-truth label per input, yet this assumption fails wherever annotators genuinely disagree. Post-hoc calibrators fitted on majority-voted labels, the standard single-label targets used in practice, can appear well-calibrated under conventional evaluation yet remain substantially miscalibrated against the underlying annotator distribution. We show that this failure is structural: under simplifying assumptions, Temperature Scaling is biased toward temperatures that underestimate annotator uncertainty, with true-label miscalibration increasing monotonically with annotation entropy. To address this, we develop a family of ambiguity-aware post-hoc calibrators that optimise proper scoring rules against the full label distribution and require no model retraining. Our methods span progressively weaker annotation requirements: Dirichlet-Soft leverages the full annotator distribution and achieves the best overall calibration quality across settings; Monte Carlo Temperature Scaling with a single annotation per example (MCTS S=1) matches full-distribution calibration across all benchmarks, dem...