[2503.15937] Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment
Summary
The paper introduces V-Droid, a mobile GUI task automation agent that utilizes a verifier-driven approach, enhancing decision-making and achieving superior task success rates compared to existing agents.
Why It Matters
This research is significant as it addresses the limitations of traditional mobile agents by employing Large Language Models (LLMs) as verifiers, which improves the efficiency and effectiveness of mobile task automation. The findings could lead to more reliable and faster mobile applications, impacting user experience and automation technologies.
Key Takeaways
- V-Droid employs LLMs as verifiers to enhance decision-making in mobile GUI tasks.
- The agent achieves a task success rate of 59.5% on AndroidWorld, outperforming existing solutions.
- V-Droid operates with a low latency of 4.3 seconds per step, significantly faster than its predecessors.
- The framework includes innovative methods for action space construction and data collection.
- This approach could revolutionize mobile task automation and improve user interactions.
Computer Science > Artificial Intelligence arXiv:2503.15937 (cs) [Submitted on 20 Mar 2025 (v1), last revised 21 Feb 2026 (this version, v5)] Title:Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment Authors:Gaole Dai, Shiqi Jiang, Ting Cao, Yuanchun Li, Yuqing Yang, Rui Tan, Mo Li, Lili Qiu View a PDF of the paper titled Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment, by Gaole Dai and 7 other authors View PDF HTML (experimental) Abstract:We propose V-Droid, a mobile GUI task automation agent. Unlike previous mobile agents that utilize Large Language Models (LLMs) as generators to directly generate actions at each step, V-Droid employs LLMs as verifiers to evaluate candidate actions before making final decisions. To realize this novel paradigm, we introduce a comprehensive framework for constructing verifier-driven mobile agents: the discretized action space construction coupled with the prefilling-only workflow to accelerate the verification process, the pair-wise progress preference training to significantly enhance the verifier's decision-making capabilities, and the scalable human-agent joint annotation scheme to efficiently collect the necessary data at scale. V-Droid obtains a substantial task success rate across several public mobile task automation benchmarks: 59.5% on AndroidWorld, 38.3% on AndroidLab, and 49% on MobileAgentBench, surpassing existing agents by 5.2%, 2.1%, and 9%, respectively. Furthermor...