[2602.13653] Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy Optimization
Summary
The paper presents a novel framework for autonomous GUI navigation using Agentic-Q estimation and step-wise policy optimization, enhancing the performance of GUI agents in dynamic environments.
Why It Matters
This research addresses the challenges faced by GUI agents in non-stationary environments, providing a scalable solution that optimizes both data collection and policy updates. The implications for AI-driven user interfaces are significant, potentially improving user experience and efficiency in various applications.
Key Takeaways
- Introduces a dual-component framework for GUI navigation.
- Agentic-Q estimation optimizes action evaluation for task completion.
- Step-wise policy optimization enhances learning efficiency.
- Empirical results show superior performance compared to larger models.
- Framework reduces data collection costs and stabilizes policy updates.
Computer Science > Artificial Intelligence arXiv:2602.13653 (cs) [Submitted on 14 Feb 2026] Title:Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy Optimization Authors:Yibo Wang, Guangda Huzhang, Yuwei Hu, Yu Xia, Shiyin Lu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang View a PDF of the paper titled Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy Optimization, by Yibo Wang and 9 other authors View PDF HTML (experimental) Abstract:Recent advances in Multimodal Large Language Models (MLLMs) have substantially driven the progress of autonomous agents for Graphical User Interface (GUI). Nevertheless, in real-world applications, GUI agents are often faced with non-stationary environments, leading to high computational costs for data curation and policy optimization. In this report, we introduce a novel MLLM-centered framework for GUI agents, which consists of two components: agentic-Q estimation and step-wise policy optimization. The former one aims to optimize a Q-model that can generate step-wise values to evaluate the contribution of a given action to task completion. The latter one takes step-wise samples from the state-action trajectory as inputs, and optimizes the policy via reinforcement learning with our agentic-Q model. It should be noticed that (i) all state-action trajectories are produced by the policy itself, so that the data collection costs are manageable; (ii) the policy update is d...