[2604.01676] GPA: Learning GUI Process Automation from Demonstrations
About this article
Abstract page for arXiv paper 2604.01676: GPA: Learning GUI Process Automation from Demonstrations
Computer Science > Computer Vision and Pattern Recognition arXiv:2604.01676 (cs) [Submitted on 2 Apr 2026 (v1), last revised 4 Apr 2026 (this version, v2)] Title:GPA: Learning GUI Process Automation from Demonstrations Authors:Zirui Zhao, Jun Hao Liew, Yan Yang, Wenzhuo Yang, Ziyang Luo, Doyen Sahoo, Silvio Savarese, Junnan Li View a PDF of the paper titled GPA: Learning GUI Process Automation from Demonstrations, by Zirui Zhao and 7 other authors View PDF Abstract:GUI Process Automation (GPA) is a lightweight but general vision-based Robotic Process Automation (RPA), which enables fast and stable process replay with only a single demo. Addressing the fragility of traditional RPA and the non-deterministic risks of current vision language model-based GUI agents, GPA introduces three core benefits: (1) Robustness via Sequential Monte Carlo-based localization to handle rescaling and detection uncertainty; (2) Deterministic and Reliability safeguarded by readiness calibration; and (3) Privacy through fast, fully local execution. This approach delivers the adaptability, robustness, and security required for enterprise workflows. It can also be used as an MCP/CLI tool by other agents with coding capabilities so that the agent only reasons and orchestrates while GPA handles the GUI execution. We conducted a pilot experiment to compare GPA with Gemini 3 Pro (with CUA tools) and found that GPA achieves higher success rate with 10 times faster execution speed in finishing long-horiz...