[2601.12349] Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?
About this article
Abstract page for arXiv paper 2601.12349: Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?
Computer Science > Cryptography and Security arXiv:2601.12349 (cs) [Submitted on 18 Jan 2026 (v1), last revised 3 Mar 2026 (this version, v2)] Title:Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents? Authors:Yi Qian, Kunwei Qian, Xingbang He, Ligeng Chen, Jikang Zhang, Tiantai Zhang, Haiyang Wei, Linzhang Wang, Hao Wu, Bing Mao View a PDF of the paper titled Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?, by Yi Qian and 9 other authors View PDF HTML (experimental) Abstract:Large multimodal model powered GUI agents are emerging as high-privilege operators on mobile platforms, entrusted with perceiving screen content and injecting inputs. However, their design operates under the implicit assumption of Visual Atomicity: that the UI state remains invariant between observation and action. We demonstrate that this assumption is fundamentally invalid in Android, creating a critical attack surface. We present Action Rebinding, a novel attack that allows a seemingly-benign app with zero dangerous permissions to rebind an agent's execution. By exploiting the inevitable observation-to-action gap inherent in the agent's reasoning pipeline, the attacker triggers foreground transitions to rebind the agent's planned action toward the target app. We weaponize the agent's task-recovery logic and Android's UI state preservation to orchestrate programmable, multi-step attack chains. Furthermore, we introduce an Intent ...