[2602.13224] A Geometric Taxonomy of Hallucinations in LLMs

[2602.13224] A Geometric Taxonomy of Hallucinations in LLMs

arXiv - AI 3 min read Article

Summary

This article presents a geometric taxonomy of hallucinations in large language models (LLMs), categorizing them into three types: unfaithfulness, confabulation, and factual error, and discusses their detection challenges.

Why It Matters

Understanding hallucinations in LLMs is crucial for improving AI reliability and safety. By categorizing these phenomena, the research provides insights into detection methods and the limitations of current benchmarks, which can inform future AI development and evaluation strategies.

Key Takeaways

  • Hallucinations in LLMs can be categorized into three types: unfaithfulness, confabulation, and factual error.
  • Detection of hallucinations varies significantly across domains, with some types being easier to detect than others.
  • Current benchmarks may not adequately capture the complexities of hallucinations, necessitating improved verification mechanisms.
  • The geometric structure of embeddings influences the detection capabilities of different types of hallucinations.
  • Type III errors, or factual inaccuracies, are particularly challenging to detect and require external verification.

Computer Science > Artificial Intelligence arXiv:2602.13224 (cs) [Submitted on 26 Jan 2026] Title:A Geometric Taxonomy of Hallucinations in LLMs Authors:Javier Marín View a PDF of the paper titled A Geometric Taxonomy of Hallucinations in LLMs, by Javier Mar\'in View PDF HTML (experimental) Abstract:The term "hallucination" in large language models conflates distinct phenomena with different geometric signatures in embedding space. We propose a taxonomy identifying three types: unfaithfulness (failure to engage with provided context), confabulation (invention of semantically foreign content), and factual error (incorrect claims within correct conceptual frames). We observe a striking asymmetry. On standard benchmarks where hallucinations are LLM-generated, detection is domain-local: AUROC 0.76-0.99 within domains, but 0.50 (chance level) across domains. Discriminative directions are approximately orthogonal between domains (mean cosine similarity -0.07). On human-crafted confabulations - invented institutions, redefined terminology, fabricated mechanisms - a single global direction achieves 0.96 AUROC with 3.8% cross-domain degradation. We interpret this divergence as follows: benchmarks capture generation artifacts (stylistic signatures of prompted fabrication), while human-crafted confabulations capture genuine topical drift. The geometric structure differs because the underlying phenomena differ. Type III errors show 0.478 AUROC - indistinguishable from chance. This ref...

Related Articles

The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?
Llms

The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?

AI Tools & Products · 12 min ·
Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute
Llms

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

AI Tools & Products · 3 min ·
How I use Claude for strategy, Gemini for research and ChatGPT for 'the grind'
Llms

How I use Claude for strategy, Gemini for research and ChatGPT for 'the grind'

AI Tools & Products · 9 min ·
Llms

Codex and Claude Code Can Work Together

AI Tools & Products ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime