[2602.16650] Retrieval Augmented Generation of Literature-derived Polymer Knowledge: The Example of a Biodegradable Polymer Expert System
Summary
This article presents a novel approach to extracting polymer knowledge from literature using Retrieval-Augmented Generation (RAG) techniques, focusing on biodegradable polymers and comparing two retrieval pipelines for enhanced data retrieval and reasoning.
Why It Matters
The research addresses the challenge of accessing and interpreting vast amounts of unstructured polymer literature, which is crucial for advancing materials science. By developing effective retrieval methods, this work enhances the ability of researchers to derive insights from existing studies, promoting innovation in biodegradable materials.
Key Takeaways
- RAG techniques can significantly improve the retrieval of polymer knowledge from literature.
- Two retrieval pipelines, VectorRAG and GraphRAG, offer complementary strengths in precision and recall.
- Expert validation confirms the effectiveness of these systems in producing reliable, evidence-based responses.
Computer Science > Computational Engineering, Finance, and Science arXiv:2602.16650 (cs) [Submitted on 18 Feb 2026] Title:Retrieval Augmented Generation of Literature-derived Polymer Knowledge: The Example of a Biodegradable Polymer Expert System Authors:Sonakshi Gupta, Akhlak Mahmood, Wei Xiong, Rampi Ramprasad View a PDF of the paper titled Retrieval Augmented Generation of Literature-derived Polymer Knowledge: The Example of a Biodegradable Polymer Expert System, by Sonakshi Gupta and 3 other authors View PDF HTML (experimental) Abstract:Polymer literature contains a large and growing body of experimental knowledge, yet much of it is buried in unstructured text and inconsistent terminology, making systematic retrieval and reasoning difficult. Existing tools typically extract narrow, study-specific facts in isolation, failing to preserve the cross-study context required to answer broader scientific questions. Retrieval-augmented generation (RAG) offers a promising way to overcome this limitation by combining large language models (LLMs) with external retrieval, but its effectiveness depends strongly on how domain knowledge is represented. In this work, we develop two retrieval pipelines: a dense semantic vector-based approach (VectorRAG) and a graph-based approach (GraphRAG). Using over 1,000 polyhydroxyalkanoate (PHA) papers, we construct context-preserving paragraph embeddings and a canonicalized structured knowledge graph supporting entity disambiguation and multi-hop...