[2603.04452] A unified foundational framework for knowledge injection and evaluation of Large Language Models in Combustion Science
About this article
Abstract page for arXiv paper 2603.04452: A unified foundational framework for knowledge injection and evaluation of Large Language Models in Combustion Science
Computer Science > Computation and Language arXiv:2603.04452 (cs) [Submitted on 27 Feb 2026] Title:A unified foundational framework for knowledge injection and evaluation of Large Language Models in Combustion Science Authors:Zonglin Yang, Runze Mao, Tianhao Wu, Han Li, QingGuo Zhou, Zhi X. Chen View a PDF of the paper titled A unified foundational framework for knowledge injection and evaluation of Large Language Models in Combustion Science, by Zonglin Yang and 5 other authors View PDF HTML (experimental) Abstract:To advance foundation Large Language Models (LLMs) for combustion science, this study presents the first end-to-end framework for developing domain-specialized models for the combustion community. The framework comprises an AI-ready multimodal knowledge base at the 3.5 billion-token scale, extracted from over 200,000 peer-reviewed articles, 8,000 theses and dissertations, and approximately 400,000 lines of combustion CFD code; a rigorous and largely automated evaluation benchmark (CombustionQA, 436 questions across eight subfields); and a three-stage knowledge-injection pathway that progresses from lightweight retrieval-augmented generation (RAG) to knowledge-graph-enhanced retrieval and continued pretraining. We first quantitatively validate Stage 1 (naive RAG) and find a hard ceiling: standard RAG accuracy peaks at 60%, far surpassing zero-shot performance (23%) yet well below the theoretical upper bound (87%). We further demonstrate that this stage's perform...