[2602.13224] A Geometric Taxonomy of Hallucinations in LLMs
Summary
This article presents a geometric taxonomy of hallucinations in large language models (LLMs), categorizing them into three types: unfaithfulness, confabulation, and factual error, and discusses their detection challenges.
Why It Matters
Understanding hallucinations in LLMs is crucial for improving AI reliability and safety. By categorizing these phenomena, the research provides insights into detection methods and the limitations of current benchmarks, which can inform future AI development and evaluation strategies.
Key Takeaways
- Hallucinations in LLMs can be categorized into three types: unfaithfulness, confabulation, and factual error.
- Detection of hallucinations varies significantly across domains, with some types being easier to detect than others.
- Current benchmarks may not adequately capture the complexities of hallucinations, necessitating improved verification mechanisms.
- The geometric structure of embeddings influences the detection capabilities of different types of hallucinations.
- Type III errors, or factual inaccuracies, are particularly challenging to detect and require external verification.
Computer Science > Artificial Intelligence arXiv:2602.13224 (cs) [Submitted on 26 Jan 2026] Title:A Geometric Taxonomy of Hallucinations in LLMs Authors:Javier Marín View a PDF of the paper titled A Geometric Taxonomy of Hallucinations in LLMs, by Javier Mar\'in View PDF HTML (experimental) Abstract:The term "hallucination" in large language models conflates distinct phenomena with different geometric signatures in embedding space. We propose a taxonomy identifying three types: unfaithfulness (failure to engage with provided context), confabulation (invention of semantically foreign content), and factual error (incorrect claims within correct conceptual frames). We observe a striking asymmetry. On standard benchmarks where hallucinations are LLM-generated, detection is domain-local: AUROC 0.76-0.99 within domains, but 0.50 (chance level) across domains. Discriminative directions are approximately orthogonal between domains (mean cosine similarity -0.07). On human-crafted confabulations - invented institutions, redefined terminology, fabricated mechanisms - a single global direction achieves 0.96 AUROC with 3.8% cross-domain degradation. We interpret this divergence as follows: benchmarks capture generation artifacts (stylistic signatures of prompted fabrication), while human-crafted confabulations capture genuine topical drift. The geometric structure differs because the underlying phenomena differ. Type III errors show 0.478 AUROC - indistinguishable from chance. This ref...