[2602.12150] GPT-4o Lacks Core Features of Theory of Mind
Summary
The paper investigates whether Large Language Models (LLMs) possess a Theory of Mind (ToM), revealing that while they perform well on social tasks, they lack a coherent model of mental states and behavior.
Why It Matters
Understanding the limitations of LLMs in replicating human-like cognitive processes is crucial for the development of AI systems that can interact socially and ethically. This research highlights the gap between LLM performance and true cognitive understanding, which has implications for AI safety and application in sensitive contexts.
Key Takeaways
- LLMs show proficiency in social tasks but lack a true Theory of Mind.
- Current evaluations do not adequately test LLMs' understanding of mental states.
- A new evaluation framework reveals inconsistencies in LLM predictions.
- Findings suggest limitations in LLMs' social reasoning capabilities.
- Implications for AI development and ethical considerations in deployment.
Computer Science > Artificial Intelligence arXiv:2602.12150 (cs) [Submitted on 12 Feb 2026 (v1), last revised 13 Feb 2026 (this version, v2)] Title:GPT-4o Lacks Core Features of Theory of Mind Authors:John Muchovej, Amanda Royka, Shane Lee, Julian Jara-Ettinger View a PDF of the paper titled GPT-4o Lacks Core Features of Theory of Mind, by John Muchovej and 3 other authors View PDF Abstract:Do Large Language Models (LLMs) possess a Theory of Mind (ToM)? Research into this question has focused on evaluating LLMs against benchmarks and found success across a range of social tasks. However, these evaluations do not test for the actual representations posited by ToM: namely, a causal model of mental states and behavior. Here, we use a cognitively-grounded definition of ToM to develop and test a new evaluation framework. Specifically, our approach probes whether LLMs have a coherent, domain-general, and consistent model of how mental states cause behavior -- regardless of whether that model matches a human-like ToM. We find that even though LLMs succeed in approximating human judgments in a simple ToM paradigm, they fail at a logically equivalent task and exhibit low consistency between their action predictions and corresponding mental state inferences. As such, these findings suggest that the social proficiency exhibited by LLMs is not the result of a domain-general or consistent ToM. Comments: Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machin...