[2602.13647] PT-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Papers
Summary
PT-RAG introduces a novel framework for retrieval-augmented generation that maintains the hierarchical structure of academic papers, improving evidence allocation and answer quality in question-answering tasks.
Why It Matters
This research addresses significant limitations in existing retrieval-augmented generation methods by preserving the structural integrity of academic papers. By reducing context fragmentation and enhancing evidence allocation, PT-RAG has the potential to improve the performance of language models in academic settings, which is crucial for researchers and practitioners relying on accurate information retrieval.
Key Takeaways
- PT-RAG preserves the hierarchical structure of academic papers for better retrieval.
- The framework reduces context fragmentation and improves evidence allocation accuracy.
- Entropy-based diagnostics are introduced to assess retrieval performance.
- PT-RAG outperforms existing methods on academic question-answering benchmarks.
- The approach enhances answer quality by providing coherent retrieval contexts.
Computer Science > Information Retrieval arXiv:2602.13647 (cs) [Submitted on 14 Feb 2026] Title:PT-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Papers Authors:Rui Yu, Tianyi Wang, Ruixia Liu, Yinglong Wang View a PDF of the paper titled PT-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Papers, by Rui Yu and 3 other authors View PDF HTML (experimental) Abstract:Retrieval-augmented generation (RAG) is increasingly applied to question-answering over long academic papers, where accurate evidence allocation under a fixed token budget is critical. Existing approaches typically flatten academic papers into unstructured chunks during preprocessing, which destroys the native hierarchical structure. This loss forces retrieval to operate in a disordered space, thereby producing fragmented contexts, misallocating tokens to non-evidential regions under finite token budgets, and increasing the reasoning burden for downstream language models. To address these issues, we propose PT-RAG, an RAG framework that treats the native hierarchical structure of academic papers as a low-entropy retrieval prior. PT-RAG first inherits the native hierarchy to construct a structure-fidelity PaperTree index, which prevents entropy increase at the source. It then designs a path-guided retrieval mechanism that aligns query semantics to relevant sections and selects high relevance root-to-leaf paths under a fixed token budget, yielding compact, coherent, and low-e...