[2603.05295] WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

[2603.05295] WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.05295: WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

Computer Science > Artificial Intelligence arXiv:2603.05295 (cs) [Submitted on 5 Mar 2026] Title:WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces Authors:Sicheng Fan, Rui Wan, Yifei Leng, Gaoning Liang, Li Ling, Yanyi Shang, Dehan Kong View a PDF of the paper titled WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces, by Sicheng Fan and 6 other authors View PDF HTML (experimental) Abstract:We introduce WebChain, the largest open-source dataset of human-annotated trajectories on real-world websites, designed to accelerate reproducible research in web agents. It contains 31,725 trajectories and 318k steps, featuring a core Triple Alignment of visual, structural, and action data to provide rich, multi-modal supervision. The data is collected via a scalable pipeline that ensures coverage of complex, high-value tasks often missed by synthetic methods. Leveraging this dataset, we propose a Dual Mid-Training recipe that decouples spatial grounding from planning, achieving state-of-the-art performance on our proposed WebChainBench and other public GUI benchmarks. Our work provides the data and insights necessary to build and rigorously evaluate the next generation of scalable web agents. Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV) Cite as: arXiv:2603.05295 [cs.AI]   (or arXiv:2603.05295v1 [cs.AI] for this version)   https://doi.org/10.48550/arXiv.2603.05295 Focus to ...

Originally published on March 06, 2026. Curated by AI News.

Related Articles

Machine Learning

VulcanAMI Might Help

I open-sourced a large AI platform I built solo, working 16 hours a day, at my kitchen table, fueled by an inordinate degree of compulsio...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

I built an experimental UI and visualization layer around Meta’s open brain-response model just to see whether this stuff actually works ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] First open-source implementation of Hebbian fast-weight write-back for the BDH architecture

The BDH (Dragon Hatchling) paper (arXiv:2509.26507) describes a Hebbian synaptic plasticity mechanism where model weights update during i...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Could really use some guidance . I'm a 2nd year Data Science UG Student

I'm currently finishing up my second year of a three year Bachelor of Data Science degree. I've got the basics down quite well, linear re...

Reddit - Machine Learning · 1 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime