[2603.00188] Efficient Long-Horizon GUI Agents via Training-Free KV

[2603.00188] Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression

arXiv - Machine Learning March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.00188: Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.00188 (cs) [Submitted on 27 Feb 2026] Title:Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression Authors:Bowen Zhou, Zhou Xu, Wanli Li, Jingyu Xiao, Haoqian Wang View a PDF of the paper titled Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression, by Bowen Zhou and 4 other authors View PDF HTML (experimental) Abstract:Large Vision-Language Models (VLMs) have emerged as powerful engines for autonomous GUI agents, yet their deployment is severely constrained by the substantial memory footprint and latency of the Key-Value (KV) cache during long-horizon interactions. While existing cache compression methods have proven effective for LLMs, we empirically demonstrate that they suffer from suboptimal performance in GUI scenarios due to a fundamental misalignment: unlike general visual tasks where attention sparsity varies across layers, GUI attention patterns exhibit uniform high-sparsity across all transformer layers. Motivated by this insight, we propose ST-Lite, a training-free KV cache compression framework tailored for efficient GUI agents that explicitly addresses the dynamic spatio-trajectory dependencies within GUI data streams. ST-Lite introduces a novel dual-branch scoring policy incorporating Component-centric Spatial Saliency (CSS) and Trajectory-aware Semantic Gating (TSG). Specifically, CSS preserves the structural integrity of interactive UI elements by eva...

Originally published on March 03, 2026. Curated by AI News.

Llms

Why are we blindly trusting AI companies with our data?

Lately I’ve been seeing a story floating around that really made me pause. Apparently, there were claims that the US government asked Ant...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

Artificial intelligence is transforming every corner of industry, and television is no exception. Major networks in Korea have recently a...

AI Tools & Products · 4 min · about 2 hours ago

Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min · about 3 hours ago

Llms

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Abstract page for arXiv paper 2603.15159: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

arXiv - AI · 4 min · about 3 hours ago

[2603.00188] Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression

About this article

Related Articles

Why are we blindly trusting AI companies with our data?

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

[2603.16629] MLLM-based Textual Explanations for Face Comparison

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

No comments

Stay updated with AI News