Machine Learning Ai Safety Data Science

[2509.11208] Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication

arXiv - Machine Learning February 24, 2026 4 min read Article

Summary

The paper discusses the impact of evidence order on the performance of transformers in binary adjudication tasks, introducing metrics to quantify reliability and hallucination risks in AI models.

Why It Matters

Understanding how evidence order affects AI decision-making is crucial for improving the reliability of models used in critical applications, such as legal and medical fields. This research provides insights into mitigating hallucinations and enhancing trust in AI systems.

Key Takeaways

Evidence order significantly influences transformer performance in binary adjudication tasks.
The study introduces metrics like Bits-to-Trust (B2T) and Risk-of-Hallucination (RoH) to assess model reliability.
A Quantified Martingale Violation (QMV) bound predicts dispersion growth, aiding in model evaluation.
The Expectation-level Decompression Law (EDFL) connects information budget to reliability in AI outputs.
Empirical results show low hallucination rates with specific gating rules under permutation mixtures.

Statistics > Machine Learning arXiv:2509.11208 (stat) [Submitted on 14 Sep 2025 (v1), last revised 22 Feb 2026 (this version, v2)] Title:Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication Authors:Leon Chlon, Ahmed Karim, Maggie Chlon, MarcAntonio Awada View a PDF of the paper titled Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication, by Leon Chlon and 3 other authors View PDF HTML (experimental) Abstract:Transformers used for evidence-grounded question answering with binary adjudication (e.g., support/refute or yes/no) can be highly sensitive to the order in which exchangeable evidence is presented, producing dispersion across permutations and unreliable attempted answers (``hallucinations'' under a Bernoulli predicate). We treat evidence order as a nuisance variable and show that next-token training minimizes expected conditional description length over orderings. This objective can be close to Bayes-optimal in expectation while deviating under any fixed ordering. We quantify this expectation--realization gap via a Quantified Martingale Violation (QMV) bound that predicts $\mathcal{O}(\log n)$ growth in permutation dispersion under harmonic positional sensitivity. We then derive the Expectation-level Decompression Law (EDFL), relating expected information budget to achievable reliability for Bernoulli predicates, and use it to define \e...

Read Original Article

[2509.11208] Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication

Summary

Why It Matters

Key Takeaways

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

wtf bro did what? arc 3 2026

Meta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at Risk | WIRED

No comments

Stay updated with AI News