[2602.14529] Disentangling Deception and Hallucination Failures in LLMs

[2602.14529] Disentangling Deception and Hallucination Failures in LLMs

arXiv - AI 3 min read Article

Summary

This paper explores the distinction between deception and hallucination failures in large language models (LLMs), proposing a mechanism-oriented perspective to analyze these issues.

Why It Matters

Understanding the different failure modes in LLMs is crucial for improving their reliability and performance. By disentangling deception from hallucination, researchers can better address the underlying causes of inaccuracies in AI outputs, leading to more effective solutions and advancements in AI safety.

Key Takeaways

  • Deception and hallucination are qualitatively different failure modes in LLMs.
  • The paper proposes a mechanism-oriented perspective to analyze these failures.
  • A controlled environment was constructed to study entity-centric factual queries.
  • The research focuses on representation separability and inference-time activation.
  • Improving understanding of these failures can enhance LLM reliability.

Computer Science > Artificial Intelligence arXiv:2602.14529 (cs) [Submitted on 16 Feb 2026] Title:Disentangling Deception and Hallucination Failures in LLMs Authors:Haolang Lu, Hongrui Peng, WeiYe Fu, Guoshun Nan, Xinye Cao, Xingrui Li, Hongcan Guo, Kun Wang View a PDF of the paper titled Disentangling Deception and Hallucination Failures in LLMs, by Haolang Lu and 7 other authors View PDF HTML (experimental) Abstract:Failures in large language models (LLMs) are often analyzed from a behavioral perspective, where incorrect outputs in factual question answering are commonly associated with missing knowledge. In this work, focusing on entity-based factual queries, we suggest that such a view may conflate different failure mechanisms, and propose an internal, mechanism-oriented perspective that separates Knowledge Existence from Behavior Expression. Under this formulation, hallucination and deception correspond to two qualitatively different failure modes that may appear similar at the output level but differ in their underlying mechanisms. To study this distinction, we construct a controlled environment for entity-centric factual questions in which knowledge is preserved while behavioral expression is selectively altered, enabling systematic analysis of four behavioral cases. We analyze these failure modes through representation separability, sparse interpretability, and inference-time activation steering. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.14529 [...

Related Articles

I can't help rooting for tiny open source AI model maker Arcee | TechCrunch
Llms

I can't help rooting for tiny open source AI model maker Arcee | TechCrunch

Arcee is a tiny 26-person U.S. startup that built a high-performing, massive, open source LLM. And it's gaining popularity with OpenClaw ...

TechCrunch - AI · 4 min ·
Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED
Llms

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED

The AI lab's Project Glasswing will bring together Apple, Google, and more than 45 other organizations. They'll use the new Claude Mythos...

Wired - AI · 7 min ·
Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min ·
Llms

Agents that write their own code at runtime and vote on capabilities, no human in the loop

hollowOS just hit v4.4 and I added something that I haven’t seen anyone else do. Previous versions gave you an OS for agents: structured ...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime