[2603.25226] WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing
About this article
Abstract page for arXiv paper 2603.25226: WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing
Computer Science > Software Engineering arXiv:2603.25226 (cs) [Submitted on 26 Mar 2026] Title:WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing Authors:Fanheng Kong, Jingyuan Zhang, Yang Yue, Chenxi Sun, Yang Tian, Shi Feng, Xiaocui Yang, Daling Wang, Yu Tian, Jun Du, Wenchong Zeng, Han Li, Kun Gai View a PDF of the paper titled WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing, by Fanheng Kong and 12 other authors View PDF HTML (experimental) Abstract:The emergence of Large Language Models (LLMs) has catalyzed a paradigm shift in programming, giving rise to "vibe coding", where users can build complete projects and even control computers using natural language instructions. This paradigm has driven automated webpage development, but it introduces a new requirement about how to automatically verify whether the web functionalities are reliably implemented. Existing works struggle to adapt, relying on static visual similarity or predefined checklists that constrain their utility in open-ended environments. Furthermore, they overlook a vital aspect of software quality, namely latent logical constraints. To address these gaps, we introduce WebTestBench, a benchmark for evaluating end-to-end automated web testing. WebTestBench encompasses comprehensive dimensions across diverse web application categories. We decompose the testing process into two cascaded sub-tasks, checklist generation and defect detectio...