[2508.06931] Automated Formalization via Conceptual Retrieval-Augmented LLMs
About this article
Abstract page for arXiv paper 2508.06931: Automated Formalization via Conceptual Retrieval-Augmented LLMs
Computer Science > Artificial Intelligence arXiv:2508.06931 (cs) [Submitted on 9 Aug 2025 (v1), last revised 21 Mar 2026 (this version, v2)] Title:Automated Formalization via Conceptual Retrieval-Augmented LLMs Authors:Wangyue Lu, Lun Du, Sirui Li, Ke Weng, Haozhe Sun, Hengyu Liu, Minghe Yu, Tiancheng Zhang, Ge Yu View a PDF of the paper titled Automated Formalization via Conceptual Retrieval-Augmented LLMs, by Wangyue Lu and 8 other authors View PDF HTML (experimental) Abstract:Interactive theorem provers (ITPs) require manual formalization, which is labor-intensive and demands expert knowledge. While automated formalization offers a potential solution, it faces two major challenges: model hallucination (e.g., undefined predicates, symbol misuse, and version incompatibility) and the semantic gap caused by ambiguous or missing premises in natural language descriptions. To address these issues, we propose CRAMF, a Concept-driven Retrieval-Augmented Mathematical Formalization framework. CRAMF enhances LLM-based autoformalization by retrieving formal definitions of core mathematical concepts, providing contextual grounding during code generation. However, applying retrieval-augmented generation (RAG) in this setting is non-trivial due to the lack of structured knowledge bases, the polymorphic nature of mathematical concepts, and the high precision required in formal retrieval. We introduce a framework for automatically constructing a concept-definition knowledge base from Mat...