[2603.24440] CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents
About this article
Abstract page for arXiv paper 2603.24440: CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents
Computer Science > Machine Learning arXiv:2603.24440 (cs) [Submitted on 25 Mar 2026] Title:CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Authors:Xiangru Jian, Shravan Nayak, Kevin Qinghong Lin, Aarash Feizi, Kaixin Li, Patrice Bechard, Spandana Gella, Sai Rajeswar View a PDF of the paper titled CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents, by Xiangru Jian and 7 other authors View PDF HTML (experimental) Abstract:Computer-use agents (CUAs) hold great promise for automating complex desktop workflows, yet progress toward general-purpose agents is bottlenecked by the scarcity of continuous, high-quality human demonstration videos. Recent work emphasizes that continuous video, not sparse screenshots, is the critical missing ingredient for scaling these agents. However, the largest existing open dataset, ScaleCUA, contains only 2 million screenshots, equating to less than 20 hours of video. To address this bottleneck, we introduce CUA-Suite, a large-scale ecosystem of expert video demonstrations and dense annotations for professional desktop computer-use agents. At its core is VideoCUA, which provides approximately 10,000 human-demonstrated tasks across 87 diverse applications with continuous 30 fps screen recordings, kinematic cursor traces, and multi-layerfed reasoning annotations, totaling approximately 55 hours and 6 million frames of expert video. Unlike sparse datasets that capture only final click coordi...