[2601.02439] WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks
Summary
WebGym is an innovative open-source environment designed for training visual web agents, featuring nearly 300,000 tasks and a high-throughput asynchronous rollout system that enhances reinforcement learning efficiency.
Why It Matters
The development of WebGym addresses the limitations of existing training environments by providing a diverse and extensive set of real-world tasks. This significantly improves the robustness of visual web agents, making them more effective in real-world applications. The advancements in rollout speed and task complexity can lead to better performance in AI applications, particularly in dynamic web environments.
Key Takeaways
- WebGym offers the largest open-source environment for training visual web agents with realistic tasks.
- It includes nearly 300,000 tasks evaluated with rubrics across various difficulty levels.
- The system achieves a 4-5x speedup in sampling trajectories compared to traditional methods.
- Fine-tuning on WebGym significantly improves the success rate of agents on unseen tasks.
- The approach enhances the learning process for AI agents in non-stationary and diverse environments.
Computer Science > Machine Learning arXiv:2601.02439 (cs) [Submitted on 5 Jan 2026 (v1), last revised 25 Feb 2026 (this version, v4)] Title:WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks Authors:Hao Bai, Alexey Taymanov, Tong Zhang, Aviral Kumar, Spencer Whitehead View a PDF of the paper titled WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks, by Hao Bai and 4 other authors View PDF HTML (experimental) Abstract:We present WebGym, the largest-to-date open-source environment for training realistic visual web agents. Real websites are non-stationary and diverse, making artificial or small-scale task sets insufficient for robust policy learning. WebGym contains nearly 300,000 tasks with rubric-based evaluations across diverse, real-world websites and difficulty levels. We train agents with a simple reinforcement learning (RL) recipe, which trains on the agent's own interaction traces (rollouts), using task rewards as feedback to guide learning. To enable scaling RL, we speed up sampling of trajectories in WebGym by developing a high-throughput asynchronous rollout system, designed specifically for web agents. Our system achieves a 4-5x rollout speedup compared to naive implementations. Second, we scale the task set breadth, depth, and size, which results in continued performance improvement. Fine-tuning a strong base vision-language model, Qwen-3-VL-8B-Instruct, on WebGym results in an improvement in success rat...