[2509.18776] AECBench: A Hierarchical Benchmark for Knowledge Evaluation of Large Language Models in the AEC Field
Summary
The paper introduces AECBench, a benchmark for evaluating large language models (LLMs) in the Architecture, Engineering, and Construction (AEC) field, highlighting their strengths and limitations across cognitive tasks.
Why It Matters
As LLMs are increasingly integrated into the AEC sector, understanding their reliability and performance is crucial for ensuring safety and efficiency in engineering practices. AECBench provides a structured evaluation framework that can guide future developments in this area.
Key Takeaways
- AECBench establishes a five-level cognitive evaluation framework for LLMs.
- The benchmark includes 23 tasks derived from real AEC practices.
- Performance declines were noted in complex reasoning and document generation tasks.
- A dataset of 4,800 questions was created to assess LLM capabilities.
- The study lays groundwork for future research on LLM integration in safety-critical fields.
Computer Science > Computation and Language arXiv:2509.18776 (cs) [Submitted on 23 Sep 2025 (v1), last revised 14 Feb 2026 (this version, v3)] Title:AECBench: A Hierarchical Benchmark for Knowledge Evaluation of Large Language Models in the AEC Field Authors:Chen Liang, Zhaoqi Huang, Haofen Wang, Fu Chai, Chunying Yu, Huanhuan Wei, Zhengjie Liu, Yanpeng Li, Hongjun Wang, Ruifeng Luo, Xianzhong Zhao View a PDF of the paper titled AECBench: A Hierarchical Benchmark for Knowledge Evaluation of Large Language Models in the AEC Field, by Chen Liang and 10 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs), as a novel information technology, are seeing increasing adoption in the Architecture, Engineering, and Construction (AEC) field. They have shown their potential to streamline processes throughout the building lifecycle. However, the robustness and reliability of LLMs in such a specialized and safety-critical domain remain to be evaluated. To address this challenge, this paper establishes AECBench, a comprehensive benchmark designed to quantify the strengths and limitations of current LLMs in the AEC domain. The benchmark features a five-level, cognition-oriented evaluation framework (i.e., Knowledge Memorization, Understanding, Reasoning, Calculation, and Application). Based on the framework, 23 representative evaluation tasks were defined. These tasks were derived from authentic AEC practice, with scope ranging from codes retrieval to specializ...