[2604.06793] Evaluating Repository-level Software Documentation via Question Answering and Feature-Driven Development
About this article
Abstract page for arXiv paper 2604.06793: Evaluating Repository-level Software Documentation via Question Answering and Feature-Driven Development
Computer Science > Software Engineering arXiv:2604.06793 (cs) [Submitted on 8 Apr 2026] Title:Evaluating Repository-level Software Documentation via Question Answering and Feature-Driven Development Authors:Xinchen Wang, Ruida Hu, Cuiyun Gao, Pengfei Gao, Chao Peng View a PDF of the paper titled Evaluating Repository-level Software Documentation via Question Answering and Feature-Driven Development, by Xinchen Wang and 4 other authors View PDF HTML (experimental) Abstract:Software documentation is crucial for repository comprehension. While Large Language Models (LLMs) advance documentation generation from code snippets to entire repositories, existing benchmarks have two key limitations: (1) they lack a holistic, repository-level assessment, and (2) they rely on unreliable evaluation strategies, such as LLM-as-a-judge, which suffers from vague criteria and limited repository-level knowledge. To address these issues, we introduce SWD-Bench, a novel benchmark for evaluating repository-level software documentation. Inspired by documentation-driven development, our strategy evaluates documentation quality by assessing an LLM's ability to understand and implement functionalities using the documentation, rather than by directly scoring it. This is measured through function-driven Question Answering (QA) tasks. SWD-Bench comprises three interconnected QA tasks: (1) Functionality Detection, to determine if a functionality is described; (2) Functionality Localization, to evaluate ...