[2602.18548] 1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World
Summary
The paper introduces 1D-Bench, a benchmark for evaluating iterative UI code generation with visual feedback, aimed at improving design-to-code processes in real-world applications.
Why It Matters
1D-Bench addresses the challenges of inconsistent datasets and evaluation methods in UI code generation, providing a standardized framework that enhances the robustness and efficiency of design-to-code tasks, particularly in e-commerce workflows.
Key Takeaways
- 1D-Bench offers a standardized benchmark for iterative UI code generation.
- It emphasizes the importance of visual feedback and robustness against extraction errors.
- The framework supports generating React codebases under fixed toolchains.
- Iterative editing significantly improves rendering success and visual similarity.
- Pilot studies indicate potential for reinforcement learning in enhancing model performance.
Computer Science > Software Engineering arXiv:2602.18548 (cs) [Submitted on 20 Feb 2026] Title:1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World Authors:Qiao Xu, Yipeng Yu, Chengxiao Feng, Xu Liu View a PDF of the paper titled 1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World, by Qiao Xu and 3 other authors View PDF Abstract:Design-to-code translates high-fidelity UI designs into executable front-end implementations, but progress remains hard to compare due to inconsistent datasets, toolchains, and evaluation protocols. We introduce 1D-Bench, a benchmark grounded in real e-commerce workflows, where each instance provides a reference rendering and an exported intermediate representation that may contain extraction errors. 1D is short for one day, representing the efficient completion of design-to-code tasks in less than one day. Models take both as input, using the intermediate representation as structural cues while being evaluated against the reference rendering, which tests robustness to intermediate representation defects rather than literal adherence. 1D-Bench requires generating an executable React codebase under a fixed toolchain with an explicit component hierarchy, and defines a multi-round setting in which models iteratively apply component-level edits using execution feedback. Experiments on commercial and open-weight multimodal models show that iterative editing generally improves fi...