[2602.22427] HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems
Summary
The paper presents HubScan, a tool designed to detect hubness poisoning in Retrieval-Augmented Generation systems, addressing a critical security flaw in AI applications.
Why It Matters
As AI systems increasingly rely on Retrieval-Augmented Generation, understanding and mitigating vulnerabilities like hubness poisoning is crucial for maintaining the integrity and reliability of these technologies. HubScan offers a practical solution to enhance security in AI applications.
Key Takeaways
- Hubness poisoning poses significant risks to Retrieval-Augmented Generation systems by allowing harmful content manipulation.
- HubScan employs a multi-detector architecture to identify and mitigate hubness threats effectively.
- The tool supports various vector databases and retrieval techniques, enhancing its applicability in real-world scenarios.
- HubScan achieved high recall rates in detecting adversarial hubs, demonstrating its effectiveness in security applications.
- The framework is extensible, allowing for adaptation to evolving threats in AI systems.
Computer Science > Cryptography and Security arXiv:2602.22427 (cs) [Submitted on 25 Feb 2026] Title:HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems Authors:Idan Habler, Vineeth Sai Narajala, Stav Koren, Amy Chang, Tiffany Saade View a PDF of the paper titled HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems, by Idan Habler and 4 other authors View PDF HTML (experimental) Abstract:Retrieval-Augmented Generation (RAG) systems are essential to contemporary AI applications, allowing large language models to obtain external knowledge via vector similarity search. Nevertheless, these systems encounter a significant security flaw: hubness - items that frequently appear in the top-k retrieval results for a disproportionately high number of varied queries. These hubs can be exploited to introduce harmful content, alter search rankings, bypass content filtering, and decrease system performance. We introduce hubscan, an open-source security scanner that evaluates vector indices and embeddings to identify hubs in RAG systems. Hubscan presents a multi-detector architecture that integrates: (1) robust statistical hubness detection utilizing median/MAD-based z-scores, (2) cluster spread analysis to assess cross-cluster retrieval patterns, (3) stability testing under query perturbations, and (4) domain-aware and modality-aware detection for category-specific and cross-modal attacks. Our solution accommodates several vector databa...