[2602.23235] Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents
Summary
The paper presents GUIPruner, a framework for enhancing the efficiency of high-resolution GUI agents by addressing spatiotemporal redundancy through innovative pruning techniques.
Why It Matters
As GUI agents become increasingly integral to user interactions, optimizing their performance while minimizing resource consumption is crucial. This research addresses significant efficiency bottlenecks, making it relevant for developers and researchers in AI and computer vision fields focused on real-time applications.
Key Takeaways
- GUIPruner introduces a training-free framework for efficient GUI navigation.
- It utilizes Temporal-Adaptive Resolution and Stratified Structure-aware Pruning to enhance performance.
- The method achieves a 3.4x reduction in FLOPs and a 3.3x speedup in encoding latency.
- Over 94% of original performance is retained despite high compression.
- The findings support real-time, high-precision navigation in resource-constrained environments.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.23235 (cs) [Submitted on 26 Feb 2026] Title:Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents Authors:Zhou Xu, Bowen Zhou, Qi Wang, Shuwen Feng, Jingyu Xiao View a PDF of the paper titled Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents, by Zhou Xu and Bowen Zhou and Qi Wang and Shuwen Feng and Jingyu Xiao View PDF HTML (experimental) Abstract:Pure-vision GUI agents provide universal interaction capabilities but suffer from severe efficiency bottlenecks due to the massive spatiotemporal redundancy inherent in high-resolution screenshots and historical trajectories. We identify two critical misalignments in existing compression paradigms: the temporal mismatch, where uniform history encoding diverges from the agent's "fading memory" attention pattern, and the spatial topology conflict, where unstructured pruning compromises the grid integrity required for precise coordinate grounding, inducing spatial hallucinations. To address these challenges, we introduce GUIPruner, a training-free framework tailored for high-resolution GUI navigation. It synergizes Temporal-Adaptive Resolution (TAR), which eliminates historical redundancy via decay-based resizing, and Stratified Structure-aware Pruning (SSP), which prioritizes interactive foregrounds and semantic anchors while safeguarding global layout. Extensive evaluations across diverse benchmarks demonstrate that GUIPrun...