[2509.06415] Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models
About this article
Abstract page for arXiv paper 2509.06415: Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models
Computer Science > Computer Vision and Pattern Recognition arXiv:2509.06415 (cs) [Submitted on 8 Sep 2025 (v1), last revised 4 Mar 2026 (this version, v2)] Title:Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models Authors:Jaemin Son, Sujin Choi, Inyong Yun View a PDF of the paper titled Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models, by Jaemin Son and 2 other authors View PDF Abstract:Recent progress in vision-language models (VLMs) has led to impressive results in document understanding tasks, but their high computational demands remain a challenge. To mitigate the compute burdens, we propose a lightweight token pruning framework that filters out non-informative background regions from document images prior to VLM processing. A binary patch-level classifier removes non-text areas, and a max-pooling refinement step recovers fragmented text regions to enhance spatial coherence. Experiments on real-world document datasets demonstrate that our approach substantially lowers computational costs, while maintaining comparable accuracy. Comments: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL) Cite as: arXiv:2509.06415 [cs.CV] (or arXiv:2509.06415v2 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2509.06415 Focus to learn more arXiv-issued DOI via DataCite Submission history From: ...