Nlp Machine Learning Generative Ai

[2602.13647] PT-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Papers

arXiv - AI February 17, 2026 4 min read Article

Summary

PT-RAG introduces a novel framework for retrieval-augmented generation that maintains the hierarchical structure of academic papers, improving evidence allocation and answer quality in question-answering tasks.

Why It Matters

This research addresses significant limitations in existing retrieval-augmented generation methods by preserving the structural integrity of academic papers. By reducing context fragmentation and enhancing evidence allocation, PT-RAG has the potential to improve the performance of language models in academic settings, which is crucial for researchers and practitioners relying on accurate information retrieval.

Key Takeaways

PT-RAG preserves the hierarchical structure of academic papers for better retrieval.
The framework reduces context fragmentation and improves evidence allocation accuracy.
Entropy-based diagnostics are introduced to assess retrieval performance.
PT-RAG outperforms existing methods on academic question-answering benchmarks.
The approach enhances answer quality by providing coherent retrieval contexts.

Computer Science > Information Retrieval arXiv:2602.13647 (cs) [Submitted on 14 Feb 2026] Title:PT-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Papers Authors:Rui Yu, Tianyi Wang, Ruixia Liu, Yinglong Wang View a PDF of the paper titled PT-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Papers, by Rui Yu and 3 other authors View PDF HTML (experimental) Abstract:Retrieval-augmented generation (RAG) is increasingly applied to question-answering over long academic papers, where accurate evidence allocation under a fixed token budget is critical. Existing approaches typically flatten academic papers into unstructured chunks during preprocessing, which destroys the native hierarchical structure. This loss forces retrieval to operate in a disordered space, thereby producing fragmented contexts, misallocating tokens to non-evidential regions under finite token budgets, and increasing the reasoning burden for downstream language models. To address these issues, we propose PT-RAG, an RAG framework that treats the native hierarchical structure of academic papers as a low-entropy retrieval prior. PT-RAG first inherits the native hierarchy to construct a structure-fidelity PaperTree index, which prevents entropy increase at the source. It then designs a path-guided retrieval mechanism that aligns query semantics to relevant sections and selects high relevance root-to-leaf paths under a fixed token budget, yielding compact, coherent, and low-e...

Read Original Article

[2602.13647] PT-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Papers

Summary

Why It Matters

Key Takeaways

Related Articles

[D] ICML 26 - What to do with the zero follow-up questions

Startup Battlefield 200 applications open until May 27 | TechCrunch

[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

[2601.13508] Autonomous Computational Catalysis Research via Agentic Systems

No comments

Stay updated with AI News