[2603.00188] Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression
About this article
Abstract page for arXiv paper 2603.00188: Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.00188 (cs) [Submitted on 27 Feb 2026] Title:Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression Authors:Bowen Zhou, Zhou Xu, Wanli Li, Jingyu Xiao, Haoqian Wang View a PDF of the paper titled Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression, by Bowen Zhou and 4 other authors View PDF HTML (experimental) Abstract:Large Vision-Language Models (VLMs) have emerged as powerful engines for autonomous GUI agents, yet their deployment is severely constrained by the substantial memory footprint and latency of the Key-Value (KV) cache during long-horizon interactions. While existing cache compression methods have proven effective for LLMs, we empirically demonstrate that they suffer from suboptimal performance in GUI scenarios due to a fundamental misalignment: unlike general visual tasks where attention sparsity varies across layers, GUI attention patterns exhibit uniform high-sparsity across all transformer layers. Motivated by this insight, we propose ST-Lite, a training-free KV cache compression framework tailored for efficient GUI agents that explicitly addresses the dynamic spatio-trajectory dependencies within GUI data streams. ST-Lite introduces a novel dual-branch scoring policy incorporating Component-centric Spatial Saliency (CSS) and Trajectory-aware Semantic Gating (TSG). Specifically, CSS preserves the structural integrity of interactive UI elements by eva...