[2603.26648] Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification
About this article
Abstract page for arXiv paper 2603.26648: Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification
Computer Science > Software Engineering arXiv:2603.26648 (cs) [Submitted on 27 Mar 2026] Title:Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification Authors:Zehai He, Wenyi Hong, Zhen Yang, Ziyang Pan, Mingdao Liu, Xiaotao Gu, Jie Tang View a PDF of the paper titled Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification, by Zehai He and 6 other authors View PDF HTML (experimental) Abstract:Recent advances in large language models have improved the capabilities of coding agents, yet systematic evaluation of complex, end-to-end website development remains limited. To address this gap, we introduce Vision2Web, a hierarchical benchmark for visual website development, spanning from static UI-to-code generation, interactive multi-page frontend reproduction, to long-horizon full-stack website development. The benchmark is constructed from real-world websites and comprises a total of 193 tasks across 16 categories, with 918 prototype images and 1,255 test cases. To support flexible, thorough and reliable evaluation, we propose workflow-based agent verification paradigm based on two complementary components: a GUI agent verifier and a VLM-based judge. We evaluate multiple visual language models instantiated under different coding-agent frameworks, revealing substantial performance gaps at all task levels, with state-of-the-art models still struggling on full-stack development. Subjects: Software Engineering ...