[2604.05253] Spike Hijacking in Late-Interaction Retrieval

[2604.05253] Spike Hijacking in Late-Interaction Retrieval

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2604.05253: Spike Hijacking in Late-Interaction Retrieval

Computer Science > Information Retrieval arXiv:2604.05253 (cs) [Submitted on 6 Apr 2026] Title:Spike Hijacking in Late-Interaction Retrieval Authors:Karthik Suresh, Tushar Vatsa, Tracy King, Asim Kadav, Michael Friedrich View a PDF of the paper titled Spike Hijacking in Late-Interaction Retrieval, by Karthik Suresh and 4 other authors View PDF HTML (experimental) Abstract:Late-interaction retrieval models rely on hard maximum similarity (MaxSim) to aggregate token-level similarities. Although effective, this winner-take-all pooling rule may structurally bias training dynamics. We provide a mechanistic study of gradient routing and robustness in MaxSim-based retrieval. In a controlled synthetic environment with in-batch contrastive training, we demonstrate that MaxSim induces significantly higher patch-level gradient concentration than smoother alternatives such as Top-k pooling and softmax aggregation. While sparse routing can improve early discrimination, it also increases sensitivity to document length: as the number of document patches grows, MaxSim degrades more sharply than mild smoothing variants. We corroborate these findings on a real-world multi-vector retrieval benchmark, where controlled document-length sweeps reveal similar brittleness under hard max pooling. Together, our results isolate pooling-induced gradient concentration as a structural property of late-interaction retrieval and highlight a sparsity-robustness tradeoff. These findings motivate principled ...

Originally published on April 08, 2026. Curated by AI News.

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Accelerating science with AI and simulations
Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min ·
Machine Learning

Weird ICML decision [D]

Hello, A friend of mine had a paper with borderline scores accepted at ICML. However, the comment made by the meta reviewers feels like t...

Reddit - Machine Learning · 1 min ·
[2603.13566] EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection
Machine Learning

[2603.13566] EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection

Abstract page for arXiv paper 2603.13566: EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection

arXiv - Machine Learning · 3 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime