[2602.22413] Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents

[2602.22413] Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents

arXiv - AI 3 min read Article

Summary

This paper explores a probabilistic framework for collective decision-making among agents that can assess their own reliability and selectively abstain from voting, enhancing accuracy in AI systems.

Why It Matters

The research addresses the challenges of collective decision-making in AI, particularly in mitigating hallucinations in large language models (LLMs). By allowing agents to abstain based on their confidence, the findings could improve the reliability of AI outputs, which is crucial for AI safety and trustworthiness.

Key Takeaways

  • Introduces a framework for agents to assess their reliability before participating in decisions.
  • Demonstrates how selective participation can enhance collective decision-making accuracy.
  • Validates theoretical findings through Monte Carlo simulations.
  • Discusses implications for reducing hallucinations in AI systems.
  • Generalizes classical voting theories to modern AI contexts.

Computer Science > Artificial Intelligence arXiv:2602.22413 (cs) [Submitted on 25 Feb 2026] Title:Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents Authors:Jonas Karge View a PDF of the paper titled Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents, by Jonas Karge View PDF HTML (experimental) Abstract:We investigate the collective accuracy of heterogeneous agents who learn to estimate their own reliability over time and selectively abstain from voting. While classical epistemic voting results, such as the \textit{Condorcet Jury Theorem} (CJT), assume fixed participation, real-world aggregation often benefits from allowing agents to say ``I don't know.'' We propose a probabilistic framework where agents engage in a \textit{calibration} phase, updating beliefs about their own fixed competence, before facing a final confidence gate that determines whether to vote or abstain. We derive a non-asymptotic lower bound on the group's success probability and prove that this \textit{selective participation} generalizes the asymptotic guarantees of the CJT to a sequential, confidence-gated setting. Empirically, we validate these bounds via Monte Carlo simulations. While our results are general, we discuss their potential application to AI safety, outlining how this framework can mitigate \textit{hallucinations} in collective LLM decision-making. Subjects: Artificial Intelligence (cs.AI) Cit...

Related Articles

Machine Learning

[D] I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Machine Learning · 1 min ·
Machine Learning

I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
[2511.16417] Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report
Ai Safety

[2511.16417] Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report

Abstract page for arXiv paper 2511.16417: Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling...

arXiv - AI · 4 min ·
More in Ai Safety: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime