[2602.13647] PT-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Papers

[2602.13647] PT-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Papers

arXiv - AI 4 min read Article

Summary

PT-RAG introduces a novel framework for retrieval-augmented generation that maintains the hierarchical structure of academic papers, improving evidence allocation and answer quality in question-answering tasks.

Why It Matters

This research addresses significant limitations in existing retrieval-augmented generation methods by preserving the structural integrity of academic papers. By reducing context fragmentation and enhancing evidence allocation, PT-RAG has the potential to improve the performance of language models in academic settings, which is crucial for researchers and practitioners relying on accurate information retrieval.

Key Takeaways

  • PT-RAG preserves the hierarchical structure of academic papers for better retrieval.
  • The framework reduces context fragmentation and improves evidence allocation accuracy.
  • Entropy-based diagnostics are introduced to assess retrieval performance.
  • PT-RAG outperforms existing methods on academic question-answering benchmarks.
  • The approach enhances answer quality by providing coherent retrieval contexts.

Computer Science > Information Retrieval arXiv:2602.13647 (cs) [Submitted on 14 Feb 2026] Title:PT-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Papers Authors:Rui Yu, Tianyi Wang, Ruixia Liu, Yinglong Wang View a PDF of the paper titled PT-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Papers, by Rui Yu and 3 other authors View PDF HTML (experimental) Abstract:Retrieval-augmented generation (RAG) is increasingly applied to question-answering over long academic papers, where accurate evidence allocation under a fixed token budget is critical. Existing approaches typically flatten academic papers into unstructured chunks during preprocessing, which destroys the native hierarchical structure. This loss forces retrieval to operate in a disordered space, thereby producing fragmented contexts, misallocating tokens to non-evidential regions under finite token budgets, and increasing the reasoning burden for downstream language models. To address these issues, we propose PT-RAG, an RAG framework that treats the native hierarchical structure of academic papers as a low-entropy retrieval prior. PT-RAG first inherits the native hierarchy to construct a structure-fidelity PaperTree index, which prevents entropy increase at the source. It then designs a path-guided retrieval mechanism that aligns query semantics to relevant sections and selects high relevance root-to-leaf paths under a fixed token budget, yielding compact, coherent, and low-e...

Related Articles

Machine Learning

[D] ICML 26 - What to do with the zero follow-up questions

Hello everyone. I submitted my work to ICML 26 this year, and it got somewhat above average reviews. Now, in the rebuttal acknowledgment,...

Reddit - Machine Learning · 1 min ·
Startup Battlefield 200 applications open until May 27 | TechCrunch
Nlp

Startup Battlefield 200 applications open until May 27 | TechCrunch

Nominate your startup, or one you know, and apply for a chance at VC access, TechCrunch coverage, and $100K for Startup Battlefield 200.

TechCrunch - AI · 4 min ·
[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
Llms

[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

Abstract page for arXiv paper 2603.24326: Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

arXiv - AI · 4 min ·
[2601.13508] Autonomous Computational Catalysis Research via Agentic Systems
Nlp

[2601.13508] Autonomous Computational Catalysis Research via Agentic Systems

Abstract page for arXiv paper 2601.13508: Autonomous Computational Catalysis Research via Agentic Systems

arXiv - AI · 3 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime