[2603.20975] DiscoUQ: Structured Disagreement Analysis for Uncertainty Quantification in LLM Agent Ensembles
About this article
Abstract page for arXiv paper 2603.20975: DiscoUQ: Structured Disagreement Analysis for Uncertainty Quantification in LLM Agent Ensembles
Computer Science > Computation and Language arXiv:2603.20975 (cs) [Submitted on 21 Mar 2026] Title:DiscoUQ: Structured Disagreement Analysis for Uncertainty Quantification in LLM Agent Ensembles Authors:Bo Jiang View a PDF of the paper titled DiscoUQ: Structured Disagreement Analysis for Uncertainty Quantification in LLM Agent Ensembles, by Bo Jiang View PDF HTML (experimental) Abstract:Multi-agent LLM systems, where multiple prompted instances of a language model independently answer questions, are increasingly used for complex reasoning tasks. However, existing methods for quantifying the uncertainty of their collective outputs rely on shallow voting statistics that discard the rich semantic information in agents' reasoning. We introduce DiscoUQ, a framework that extracts and leverages the structure of inter-agent disagreement -- both linguistic properties (evidence overlap, argument strength, divergence depth) and embedding geometry (cluster distances, dispersion, cohesion) -- to produce well-calibrated confidence estimates. We propose three methods of increasing complexity: DiscoUQ-LLM (logistic regression on LLM-extracted structure features), DiscoUQ-Embed (logistic regression on embedding geometry), and DiscoUQ-Learn (a neural network combining all features). Evaluated on four diverse benchmarks (StrategyQA, MMLU, TruthfulQA, ARC-Challenge) with a 5-agent system using Qwen3.5-27B, DiscoUQ-LLM achieves an average AUROC of 0.802, outperforming the best baseline (LLM Ag...