[2602.12510] Visual RAG Toolkit: Scaling Multi-Vector Visual Retrieval with Training-Free Pooling and Multi-Stage Search

[2602.12510] Visual RAG Toolkit: Scaling Multi-Vector Visual Retrieval with Training-Free Pooling and Multi-Stage Search

arXiv - Machine Learning 4 min read Article

Summary

The Visual RAG Toolkit enhances multi-vector visual retrieval by introducing a training-free pooling method and a multi-stage search process, significantly improving efficiency and accessibility.

Why It Matters

This toolkit addresses the scalability issues of existing visual retrieval systems, making advanced retrieval techniques more accessible to practitioners without extensive hardware requirements. It emphasizes efficiency while maintaining accuracy, which is crucial in the rapidly evolving field of computer vision and information retrieval.

Key Takeaways

  • The Visual RAG Toolkit reduces vector-to-vector comparisons from thousands to dozens, enhancing retrieval efficiency.
  • It employs training-free pooling and multi-stage retrieval to maintain accuracy while improving throughput.
  • The toolkit includes robust preprocessing features, facilitating easier integration into existing workflows.
  • Performance is optimized for common retrieval cutoffs, lowering hardware barriers for users.
  • The approach is validated through experiments, demonstrating minimal degradation in retrieval quality.

Computer Science > Information Retrieval arXiv:2602.12510 (cs) [Submitted on 13 Feb 2026] Title:Visual RAG Toolkit: Scaling Multi-Vector Visual Retrieval with Training-Free Pooling and Multi-Stage Search Authors:Ara Yeroyan View a PDF of the paper titled Visual RAG Toolkit: Scaling Multi-Vector Visual Retrieval with Training-Free Pooling and Multi-Stage Search, by Ara Yeroyan View PDF HTML (experimental) Abstract:Multi-vector visual retrievers (e.g., ColPali-style late interaction models) deliver strong accuracy, but scale poorly because each page yields thousands of vectors, making indexing and search increasingly expensive. We present Visual RAG Toolkit, a practical system for scaling visual multi-vector retrieval with training-free, model-aware pooling and multi-stage retrieval. Motivated by Matryoshka Embeddings, our method performs static spatial pooling - including a lightweight sliding-window averaging variant - over patch embeddings to produce compact tile-level and global representations for fast candidate generation, followed by exact MaxSim reranking using full multi-vector embeddings. Our design yields a quadratic reduction in vector-to-vector comparisons by reducing stored vectors per page from thousands to dozens, notably without requiring post-training, adapters, or distillation. Across experiments with interaction-style models such as ColPali and ColSmol-500M, we observe that over the limited ViDoRe v2 benchmark corpus 2-stage retrieval typically preserves ...

Related Articles

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch
Machine Learning

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch

Less than a year after launching, with checks from some of the biggest names in Silicon Valley, crowdsourced AI model feedback startup Yu...

TechCrunch - AI · 4 min ·
Machine Learning

[R] Fine-tuning services report

If you have some data and want to train or run a small custom model but don't have powerful enough hardware for training, fine-tuning ser...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Does ML have a "bible"/reference textbook at the Intermediate/Advanced level?

Hello, everyone! This is my first time posting here and I apologise if the question is, perhaps, a bit too basic for this sub-reddit. A b...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML 2026 review policy debate: 100 responses suggest Policy B may score higher, while Policy A shows higher confidence

A week ago I made a thread asking whether ICML 2026’s review policy might have affected review outcomes, especially whether Policy A pape...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime