[2602.22427] HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

[2602.22427] HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

arXiv - AI 4 min read Article

Summary

The paper presents HubScan, a tool designed to detect hubness poisoning in Retrieval-Augmented Generation systems, addressing a critical security flaw in AI applications.

Why It Matters

As AI systems increasingly rely on Retrieval-Augmented Generation, understanding and mitigating vulnerabilities like hubness poisoning is crucial for maintaining the integrity and reliability of these technologies. HubScan offers a practical solution to enhance security in AI applications.

Key Takeaways

  • Hubness poisoning poses significant risks to Retrieval-Augmented Generation systems by allowing harmful content manipulation.
  • HubScan employs a multi-detector architecture to identify and mitigate hubness threats effectively.
  • The tool supports various vector databases and retrieval techniques, enhancing its applicability in real-world scenarios.
  • HubScan achieved high recall rates in detecting adversarial hubs, demonstrating its effectiveness in security applications.
  • The framework is extensible, allowing for adaptation to evolving threats in AI systems.

Computer Science > Cryptography and Security arXiv:2602.22427 (cs) [Submitted on 25 Feb 2026] Title:HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems Authors:Idan Habler, Vineeth Sai Narajala, Stav Koren, Amy Chang, Tiffany Saade View a PDF of the paper titled HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems, by Idan Habler and 4 other authors View PDF HTML (experimental) Abstract:Retrieval-Augmented Generation (RAG) systems are essential to contemporary AI applications, allowing large language models to obtain external knowledge via vector similarity search. Nevertheless, these systems encounter a significant security flaw: hubness - items that frequently appear in the top-k retrieval results for a disproportionately high number of varied queries. These hubs can be exploited to introduce harmful content, alter search rankings, bypass content filtering, and decrease system performance. We introduce hubscan, an open-source security scanner that evaluates vector indices and embeddings to identify hubs in RAG systems. Hubscan presents a multi-detector architecture that integrates: (1) robust statistical hubness detection utilizing median/MAD-based z-scores, (2) cluster spread analysis to assess cross-cluster retrieval patterns, (3) stability testing under query perturbations, and (4) domain-aware and modality-aware detection for category-specific and cross-modal attacks. Our solution accommodates several vector databa...

Related Articles

[2603.18532] Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds
Llms

[2603.18532] Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds

Abstract page for arXiv paper 2603.18532: Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds

arXiv - Machine Learning · 4 min ·
[2603.12702] FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning
Llms

[2603.12702] FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning

Abstract page for arXiv paper 2603.12702: FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning

arXiv - Machine Learning · 4 min ·
[2603.12681] Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment
Llms

[2603.12681] Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment

Abstract page for arXiv paper 2603.12681: Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment

arXiv - Machine Learning · 3 min ·
[2602.06098] A Theoretical Analysis of Test-Driven LLM Code Generation
Llms

[2602.06098] A Theoretical Analysis of Test-Driven LLM Code Generation

Abstract page for arXiv paper 2602.06098: A Theoretical Analysis of Test-Driven LLM Code Generation

arXiv - Machine Learning · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime