[2509.21764] CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones

[2509.21764] CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2509.21764: CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones

Computer Science > Computer Vision and Pattern Recognition arXiv:2509.21764 (cs) [Submitted on 26 Sep 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones Authors:Wenyi Gong, Mieszko Lis View a PDF of the paper titled CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones, by Wenyi Gong and 1 other authors View PDF HTML (experimental) Abstract:Many modern ViT backbones adopt spatial architectural designs, such as window attention, decomposed relative positional embeddings in SAM, and RoPE in DINOv3. Such architectures impose new challenges on token reduction, as the vast majority of existing methods fail to preserve the spatial structure these architectures depend on. In this paper, we introduce a simple yet effective token merging method that maintains spatial integrity, enabling seamless compatibility with spatial architectures. We reconcile two seemingly conflicting requirements: (i)exploiting the uneven information distribution across the spatial layout while (ii)preserving the spatial structure post-merging. Our approach employs (i)a 2D reduction strategy to enforce structured token layouts, (ii)a spatial-aware merging algorithm that maintains relative token positions, and (iii)a novel max-magnitude-per-dimension token representation that preserves salient features. Our method demonstrates strong performance both off-the-shelf and with fine-tuning, achieving state-of-the-a...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

The Galaxy S26’s photo app can sloppify your memories | The Verge
Nlp

The Galaxy S26’s photo app can sloppify your memories | The Verge

Samsung’s S26 series offers some new AI photo editing capabilities to transform your photos. But where’s the line between acceptable edit...

The Verge - AI · 8 min ·
Llms

[D] The problem with comparing AI memory system benchmarks — different evaluation methods make scores meaningless

I've been reviewing how various AI memory systems evaluate their performance and noticed a fundamental issue with cross-system comparison...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Machine Learning · 1 min ·
Machine Learning

I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Artificial Intelligence · 1 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime