[2509.11208] Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication

[2509.11208] Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication

arXiv - Machine Learning 4 min read Article

Summary

The paper discusses the impact of evidence order on the performance of transformers in binary adjudication tasks, introducing metrics to quantify reliability and hallucination risks in AI models.

Why It Matters

Understanding how evidence order affects AI decision-making is crucial for improving the reliability of models used in critical applications, such as legal and medical fields. This research provides insights into mitigating hallucinations and enhancing trust in AI systems.

Key Takeaways

  • Evidence order significantly influences transformer performance in binary adjudication tasks.
  • The study introduces metrics like Bits-to-Trust (B2T) and Risk-of-Hallucination (RoH) to assess model reliability.
  • A Quantified Martingale Violation (QMV) bound predicts dispersion growth, aiding in model evaluation.
  • The Expectation-level Decompression Law (EDFL) connects information budget to reliability in AI outputs.
  • Empirical results show low hallucination rates with specific gating rules under permutation mixtures.

Statistics > Machine Learning arXiv:2509.11208 (stat) [Submitted on 14 Sep 2025 (v1), last revised 22 Feb 2026 (this version, v2)] Title:Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication Authors:Leon Chlon, Ahmed Karim, Maggie Chlon, MarcAntonio Awada View a PDF of the paper titled Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication, by Leon Chlon and 3 other authors View PDF HTML (experimental) Abstract:Transformers used for evidence-grounded question answering with binary adjudication (e.g., support/refute or yes/no) can be highly sensitive to the order in which exchangeable evidence is presented, producing dispersion across permutations and unreliable attempted answers (``hallucinations'' under a Bernoulli predicate). We treat evidence order as a nuisance variable and show that next-token training minimizes expected conditional description length over orderings. This objective can be close to Bayes-optimal in expectation while deviating under any fixed ordering. We quantify this expectation--realization gap via a Quantified Martingale Violation (QMV) bound that predicts $\mathcal{O}(\log n)$ growth in permutation dispersion under harmonic positional sensitivity. We then derive the Expectation-level Decompression Law (EDFL), relating expected information budget to achievable reliability for Bernoulli predicates, and use it to define \e...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Llms

wtf bro did what? arc 3 2026

The Physarum Explorer is a high-speed, bio-inspired neural model designed specifically for ARC geometry. Here is the snapshot of its curr...

Reddit - Artificial Intelligence · 1 min ·
Meta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at Risk | WIRED
Machine Learning

Meta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at Risk | WIRED

Major AI labs are investigating a security incident that impacted Mercor, a leading data vendor. The incident could have exposed key data...

Wired - AI · 6 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime