[2602.22413] Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents
Summary
This paper explores a probabilistic framework for collective decision-making among agents that can assess their own reliability and selectively abstain from voting, enhancing accuracy in AI systems.
Why It Matters
The research addresses the challenges of collective decision-making in AI, particularly in mitigating hallucinations in large language models (LLMs). By allowing agents to abstain based on their confidence, the findings could improve the reliability of AI outputs, which is crucial for AI safety and trustworthiness.
Key Takeaways
- Introduces a framework for agents to assess their reliability before participating in decisions.
- Demonstrates how selective participation can enhance collective decision-making accuracy.
- Validates theoretical findings through Monte Carlo simulations.
- Discusses implications for reducing hallucinations in AI systems.
- Generalizes classical voting theories to modern AI contexts.
Computer Science > Artificial Intelligence arXiv:2602.22413 (cs) [Submitted on 25 Feb 2026] Title:Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents Authors:Jonas Karge View a PDF of the paper titled Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents, by Jonas Karge View PDF HTML (experimental) Abstract:We investigate the collective accuracy of heterogeneous agents who learn to estimate their own reliability over time and selectively abstain from voting. While classical epistemic voting results, such as the \textit{Condorcet Jury Theorem} (CJT), assume fixed participation, real-world aggregation often benefits from allowing agents to say ``I don't know.'' We propose a probabilistic framework where agents engage in a \textit{calibration} phase, updating beliefs about their own fixed competence, before facing a final confidence gate that determines whether to vote or abstain. We derive a non-asymptotic lower bound on the group's success probability and prove that this \textit{selective participation} generalizes the asymptotic guarantees of the CJT to a sequential, confidence-gated setting. Empirically, we validate these bounds via Monte Carlo simulations. While our results are general, we discuss their potential application to AI safety, outlining how this framework can mitigate \textit{hallucinations} in collective LLM decision-making. Subjects: Artificial Intelligence (cs.AI) Cit...