[2509.25532] Calibrating Verbalized Confidence with Self-Generated Distractors
About this article
Abstract page for arXiv paper 2509.25532: Calibrating Verbalized Confidence with Self-Generated Distractors
Computer Science > Computation and Language arXiv:2509.25532 (cs) [Submitted on 29 Sep 2025 (v1), last revised 27 Feb 2026 (this version, v2)] Title:Calibrating Verbalized Confidence with Self-Generated Distractors Authors:Victor Wang, Elias Stengel-Eskin View a PDF of the paper titled Calibrating Verbalized Confidence with Self-Generated Distractors, by Victor Wang and 1 other authors View PDF HTML (experimental) Abstract:Calibrated confidence estimates are necessary for large language model (LLM) outputs to be trusted by human users. While LLMs can express their confidence in human-interpretable ways, verbalized LLM-generated confidence scores have empirically been found to be miscalibrated, reporting high confidence on instances with low accuracy and thereby harming trust and safety. We hypothesize that this overconfidence often stems from a given LLM's heightened suggestibility when faced with claims that it encodes little information about; we empirically validate this hypothesis, finding more suggestibility on lower-accuracy claims. Building on this finding, we introduce Distractor-Normalized Coherence (DINCO), which estimates and accounts for an LLM's suggestibility bias by having the model verbalize its confidence independently across several self-generated distractors (i.e. alternative claims), and normalizes by the total verbalized confidence. To further improve calibration, we leverage generator-validator disagreement, augmenting normalized validator confidence ...