Llms Machine Learning Ai Infrastructure Ai Safety Generative Ai

[2602.14529] Disentangling Deception and Hallucination Failures in LLMs

arXiv - AI February 17, 2026 3 min read Article

Summary

This paper explores the distinction between deception and hallucination failures in large language models (LLMs), proposing a mechanism-oriented perspective to analyze these issues.

Why It Matters

Understanding the different failure modes in LLMs is crucial for improving their reliability and performance. By disentangling deception from hallucination, researchers can better address the underlying causes of inaccuracies in AI outputs, leading to more effective solutions and advancements in AI safety.

Key Takeaways

Deception and hallucination are qualitatively different failure modes in LLMs.
The paper proposes a mechanism-oriented perspective to analyze these failures.
A controlled environment was constructed to study entity-centric factual queries.
The research focuses on representation separability and inference-time activation.
Improving understanding of these failures can enhance LLM reliability.

Computer Science > Artificial Intelligence arXiv:2602.14529 (cs) [Submitted on 16 Feb 2026] Title:Disentangling Deception and Hallucination Failures in LLMs Authors:Haolang Lu, Hongrui Peng, WeiYe Fu, Guoshun Nan, Xinye Cao, Xingrui Li, Hongcan Guo, Kun Wang View a PDF of the paper titled Disentangling Deception and Hallucination Failures in LLMs, by Haolang Lu and 7 other authors View PDF HTML (experimental) Abstract:Failures in large language models (LLMs) are often analyzed from a behavioral perspective, where incorrect outputs in factual question answering are commonly associated with missing knowledge. In this work, focusing on entity-based factual queries, we suggest that such a view may conflate different failure mechanisms, and propose an internal, mechanism-oriented perspective that separates Knowledge Existence from Behavior Expression. Under this formulation, hallucination and deception correspond to two qualitatively different failure modes that may appear similar at the output level but differ in their underlying mechanisms. To study this distinction, we construct a controlled environment for entity-centric factual questions in which knowledge is preserved while behavioral expression is selectively altered, enabling systematic analysis of four behavioral cases. We analyze these failure modes through representation separability, sparse interpretability, and inference-time activation steering. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.14529 [...

Read Original Article

[2602.14529] Disentangling Deception and Hallucination Failures in LLMs

Summary

Why It Matters

Key Takeaways

Related Articles

I can't help rooting for tiny open source AI model maker Arcee | TechCrunch

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

Agents that write their own code at runtime and vote on capabilities, no human in the loop

No comments

Stay updated with AI News