[2509.21199] A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA
Summary
This paper presents a theoretical framework establishing a Fano-style accuracy upper bound for single-pass reasoning in multi-hop question answering (MHQA) using large language models (LLMs). It introduces a new method, InfoQA, to enhance accuracy by managing reasoning complex...
Why It Matters
Understanding the limitations of LLMs in multi-hop QA is crucial for developing more effective AI systems. This research provides a theoretical foundation for addressing these limitations, potentially leading to improved performance in complex reasoning tasks and advancing the field of AI.
Key Takeaways
- Establishes a theoretical upper bound on accuracy for LLMs in single-pass reasoning.
- Introduces InfoQA, a framework that improves multi-hop QA performance by managing information load.
- Demonstrates that model accuracy declines as task complexity exceeds capacity.
- Validates the theoretical framework with a stringent benchmark and experimental results.
- Encourages further research into capacity-aware reasoning methods for LLMs.
Computer Science > Artificial Intelligence arXiv:2509.21199 (cs) [Submitted on 25 Sep 2025 (v1), last revised 16 Feb 2026 (this version, v2)] Title:A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA Authors:Kaiyang Wan, Lang Gao, Honglin Mu, Preslav Nakov, Yuxia Wang, Xiuying Chen View a PDF of the paper titled A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA, by Kaiyang Wan and 5 other authors View PDF HTML (experimental) Abstract:Multi-Hop Question Answering (MHQA) requires integrating dispersed, interdependent evidence through sequential reasoning under noise. This task is challenging for LLMs as they have a finite per-pass output capacity, beyond which the integration of task-relevant evidence proves unreliable. Consequently, the single-pass reasoning paradigm is inherently vulnerable to this capacity overflow. To formalize this bottleneck, our analysis establishes a Fano-style accuracy upper bound, defining a theoretical performance ceiling for single-pass LLMs. This bound reveals that accuracy inevitably collapses once task complexity exceeds model capacity, providing general principles for capacity-aware representation and structuring of MHQA in LLMs. Building on these principles, we introduce a proof-of-concept multi-call framework for MHQA, InfoQA. It ensures high per-step accuracy by combining capacity-aware task decomposition with active pruning of prior reasoning traces, keeping the information load wi...