[2508.01222] WebDS: An End-to-End Benchmark for Web-based Data Science
About this article
Abstract page for arXiv paper 2508.01222: WebDS: An End-to-End Benchmark for Web-based Data Science
Computer Science > Computation and Language arXiv:2508.01222 (cs) [Submitted on 2 Aug 2025 (v1), last revised 4 Mar 2026 (this version, v2)] Title:WebDS: An End-to-End Benchmark for Web-based Data Science Authors:Ethan Hsu, Hong Meng Yam, Ines Bouissou, Aaron Murali John, Raj Thota, Josh Koe, Vivek Sarath Putta, G K Dharesan, Alexander Spangher, Shikhar Murty, Tenghao Huang, Christopher D. Manning View a PDF of the paper titled WebDS: An End-to-End Benchmark for Web-based Data Science, by Ethan Hsu and 11 other authors View PDF HTML (experimental) Abstract:Many real-world data science tasks involve complex web-based interactions: finding appropriate data available on the internet, synthesizing multimodal data from different locations, and producing summarized analyses. Existing web benchmarks often focus on simplistic interactions and often do not require diverse tool-using capabilities. Conversely, traditional data science benchmarks typically concentrate on static, highly structured datasets and do not assess end-to-end workflows that encompass data acquisition, cleaning, analysis, and insight generation. In response, we introduce WebDS, the first end-to-end web-based data science benchmark. It comprises 870 web-based data science tasks across 29 diverse websites from structured government data portals to unstructured news media, challenging agents to perform complex, multi-step, tool-based operations, across heterogeneous data formats, to better reflect the realities of...