Llms Machine Learning Ai Infrastructure Computer Vision

[2512.02700] VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm

arXiv - Machine Learning February 24, 2026 4 min read Article

Summary

The paper presents VLM-Pruner, a novel token pruning algorithm designed to enhance the efficiency of vision-language models (VLMs) by balancing redundancy and spatial sparsity, resulting in improved performance and reduced computational costs.

Why It Matters

As vision-language models become increasingly integral to applications in AI, optimizing their performance while reducing computational demands is crucial for deployment on mobile devices and in real-time applications. VLM-Pruner addresses these challenges by improving token selection processes, which can lead to more efficient AI systems.

Key Takeaways

VLM-Pruner improves token pruning by balancing redundancy and spatial relationships.
The algorithm achieves an 88.9% pruning rate while maintaining performance.
A centrifugal token pruning paradigm enhances the selection process for better detail retention.
The method includes a Buffering for Spatial Sparsity (BSS) criterion to optimize token selection.
Comprehensive comparisons show VLM-Pruner outperforms existing methods across multiple VLMs.

Computer Science > Computer Vision and Pattern Recognition arXiv:2512.02700 (cs) [Submitted on 2 Dec 2025 (v1), last revised 22 Feb 2026 (this version, v3)] Title:VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm Authors:Zhenkai Wu, Xiaowen Ma, Zhenliang Ni, Dengming Zhang, Han Shu, Xin Jiang, Xinghao Chen View a PDF of the paper titled VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm, by Zhenkai Wu and 6 other authors View PDF HTML (experimental) Abstract:Vision-language models (VLMs) excel at image understanding tasks, but the large number of visual tokens imposes significant computational costs, hindering deployment on mobile devices. Many pruning methods rely solely on token importance and thus overlook inter-token redundancy, retaining numerous duplicated tokens and wasting capacity. Although some redundancy-aware approaches have been proposed, they often ignore the spatial relationships among visual tokens. This can lead to overly sparse selections of retained tokens that fail to adequately cover the regions of target objects. To address these limitations, we propose VLM-Pruner, a training-free token pruning algorithm that explicitly balances redundancy and spatial sparsity. We introduce a centrifugal token pruning paradigm that enables near-to-far selection while prioritizing the preservation of fine-grained object details. Moreover, we design a Buffering for Spatial Spa...

Read Original Article

[2512.02700] VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm

Summary

Why It Matters

Key Takeaways

Related Articles

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

[2603.16629] MLLM-based Textual Explanations for Face Comparison

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

[2602.08316] SWE Context Bench: A Benchmark for Context Learning in Coding

No comments

Stay updated with AI News