[2404.08567] CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference

[2404.08567] CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference

arXiv - AI 3 min read Article

Summary

The paper introduces Cross-Attention Token Pruning (CATP), a method designed to enhance the accuracy of multimodal models by effectively pruning tokens while preserving precision, achieving significant improvements over existing methods.

Why It Matters

As multimodal models gain traction in AI applications, optimizing their performance without sacrificing accuracy is crucial. CATP addresses this need by providing a novel approach to token pruning, potentially influencing future model designs and applications in various AI fields.

Key Takeaways

  • CATP leverages cross-attention layers for effective token pruning.
  • The method achieves up to 12.1X higher accuracy compared to existing techniques.
  • It addresses the balance between computational efficiency and model precision.
  • The refined voting strategy enhances token importance determination.
  • CATP is applicable to large multimodal models like BLIP-2.

Computer Science > Computation and Language arXiv:2404.08567 (cs) [Submitted on 2 Apr 2024 (v1), last revised 12 Feb 2026 (this version, v2)] Title:CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference Authors:Ruqi Liao, Chuqing Zhao, Jin Li, Weiqi Feng, Yi Lyu, Bingxian Chen, Haochen Yang View a PDF of the paper titled CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference, by Ruqi Liao and 6 other authors View PDF HTML (experimental) Abstract:In response to the rising interest in large multimodal models, we introduce Cross-Attention Token Pruning (CATP), a precision-focused token pruning method. Our approach leverages cross-attention layers in multimodal models, exemplified by BLIP-2, to extract valuable information for token importance determination. CATP employs a refined voting strategy across model heads and layers. In evaluations, CATP achieves up to 12.1X higher accuracy compared to existing token pruning methods, addressing the trade-off between computational efficiency and model precision. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) Cite as: arXiv:2404.08567 [cs.CL]   (or arXiv:2404.08567v2 [cs.CL] for this version)   https://doi.org/10.48550/arXiv.2404.08567 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Weiqi Feng [view email] [v1] Tue, 2 Apr 2024 04:35:35 UTC (2,238 KB) [v2] Thu, 12 Feb 2026 22:43:57 UTC (2,035 KB) Full-text links: Access ...

Related Articles

Machine Learning

[P] Fused MoE Dispatch in Pure Triton: Beating CUDA-Optimized Megablocks at Inference Batch Sizes

I built a fused MoE dispatch kernel in pure Triton that handles the full forward pass for Mixture-of-Experts models. No CUDA, no vendor-s...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML Rebuttal Question

I am currently working on my response on the rebuttal acknowledgments for ICML and I doubting how to handle the strawman argument of that...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ML researcher looking to switch to a product company.

Hey, I am an AI researcher currently working in a deep tech company as a data scientist. Prior to this, I was doing my PhD. My current ro...

Reddit - Machine Learning · 1 min ·
Machine Learning

Building behavioural response models of public figures using Brain scan data (Predict their next move using psychological modelling) [P]

Hey guys, I’m the same creator of Netryx V2, the geolocation tool. I’ve been working on something new called COGNEX. It learns how a pers...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime