Llms Machine Learning Nlp Ai Safety Generative Ai

[2602.22427] HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

arXiv - AI February 27, 2026 4 min read Article

Summary

The paper presents HubScan, a tool designed to detect hubness poisoning in Retrieval-Augmented Generation systems, addressing a critical security flaw in AI applications.

Why It Matters

As AI systems increasingly rely on Retrieval-Augmented Generation, understanding and mitigating vulnerabilities like hubness poisoning is crucial for maintaining the integrity and reliability of these technologies. HubScan offers a practical solution to enhance security in AI applications.

Key Takeaways

Hubness poisoning poses significant risks to Retrieval-Augmented Generation systems by allowing harmful content manipulation.
HubScan employs a multi-detector architecture to identify and mitigate hubness threats effectively.
The tool supports various vector databases and retrieval techniques, enhancing its applicability in real-world scenarios.
HubScan achieved high recall rates in detecting adversarial hubs, demonstrating its effectiveness in security applications.
The framework is extensible, allowing for adaptation to evolving threats in AI systems.

Computer Science > Cryptography and Security arXiv:2602.22427 (cs) [Submitted on 25 Feb 2026] Title:HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems Authors:Idan Habler, Vineeth Sai Narajala, Stav Koren, Amy Chang, Tiffany Saade View a PDF of the paper titled HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems, by Idan Habler and 4 other authors View PDF HTML (experimental) Abstract:Retrieval-Augmented Generation (RAG) systems are essential to contemporary AI applications, allowing large language models to obtain external knowledge via vector similarity search. Nevertheless, these systems encounter a significant security flaw: hubness - items that frequently appear in the top-k retrieval results for a disproportionately high number of varied queries. These hubs can be exploited to introduce harmful content, alter search rankings, bypass content filtering, and decrease system performance. We introduce hubscan, an open-source security scanner that evaluates vector indices and embeddings to identify hubs in RAG systems. Hubscan presents a multi-detector architecture that integrates: (1) robust statistical hubness detection utilizing median/MAD-based z-scores, (2) cluster spread analysis to assess cross-cluster retrieval patterns, (3) stability testing under query perturbations, and (4) domain-aware and modality-aware detection for category-specific and cross-modal attacks. Our solution accommodates several vector databa...

Read Original Article

[2602.22427] HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

Summary

Why It Matters

Key Takeaways

Related Articles

[2603.18532] Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds

[2603.12702] FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning

[2603.12681] Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment

[2602.06098] A Theoretical Analysis of Test-Driven LLM Code Generation

No comments

Stay updated with AI News