[2506.09886] Probabilistic distances-based hallucination detection in LLMs with RAG
Summary
This paper presents a novel method for detecting hallucinations in large language models (LLMs) using probabilistic distances in retrieval-augmented generation (RAG) settings, demonstrating competitive performance in ensuring factuality in AI-generated text.
Why It Matters
As LLMs become integral in various applications, ensuring their reliability is crucial. This research addresses the persistent issue of hallucinations, proposing a method that enhances the safety and accuracy of AI outputs, which is vital for user trust and application efficacy.
Key Takeaways
- Introduces a probabilistic method for hallucination detection in LLMs.
- Focuses on the geometric structure of token embeddings for factuality assessment.
- Achieves state-of-the-art performance while being unsupervised and efficient.
- Demonstrates transferability from natural language inference (NLI) tasks.
- Addresses a critical gap in existing hallucination detection methods for RAG systems.
Computer Science > Computation and Language arXiv:2506.09886 (cs) [Submitted on 11 Jun 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:Probabilistic distances-based hallucination detection in LLMs with RAG Authors:Rodion Oblovatny, Alexandra Kuleshova, Konstantin Polev, Alexey Zaytsev View a PDF of the paper titled Probabilistic distances-based hallucination detection in LLMs with RAG, by Rodion Oblovatny and 3 other authors View PDF HTML (experimental) Abstract:Detecting hallucinations in large language models (LLMs) is critical for their safety in many applications. Without proper detection, these systems often provide harmful, unreliable answers. In recent years, LLMs have been actively used in retrieval-augmented generation (RAG) settings. However, hallucinations remain even in this setting, and while numerous hallucination detection methods have been proposed, most approaches are not specifically designed for RAG systems. To overcome this limitation, we introduce a hallucination detection method based on estimating the distances between the distributions of prompt token embeddings and language model response token embeddings. The method examines the geometric structure of token hidden states to reliably extract a signal of factuality in text, while remaining friendly to long sequences. Extensive experiments demonstrate that our method achieves state-of-the-art or competitive performance. It also has transferability from solving the NLI task to the halluc...