Llms Machine Learning Ai Agents Ai Safety

[2602.22072] Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models

arXiv - AI February 26, 2026 3 min read Article

Summary

This article explores the robustness of Theory of Mind (ToM) in large language models (LLMs) through perturbation tasks, revealing significant performance drops and the nuanced effects of Chain-of-Thought prompting.

Why It Matters

Understanding the limitations of LLMs in exhibiting Theory of Mind capabilities is crucial for advancing AI development. This study highlights the need for careful evaluation of AI reasoning processes, especially in tasks requiring complex understanding of others' mental states.

Key Takeaways

LLMs show a steep decline in ToM capabilities when faced with task perturbations.
Chain-of-Thought prompting can enhance ToM performance but may degrade accuracy in certain scenarios.
A new annotated ToM dataset was introduced to facilitate further research in this area.
The study questions the robustness of ToM in LLMs, suggesting selective application of prompting techniques.
Metrics for evaluating reasoning chain correctness were proposed, aiding future AI assessments.

Computer Science > Computation and Language arXiv:2602.22072 (cs) [Submitted on 25 Feb 2026] Title:Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models Authors:Christian Nickel, Laura Schrewe, Florian Mai, Lucie Flek View a PDF of the paper titled Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models, by Christian Nickel and 3 other authors View PDF HTML (experimental) Abstract:Theory of Mind (ToM) refers to an agent's ability to model the internal states of others. Contributing to the debate whether large language models (LLMs) exhibit genuine ToM capabilities, our study investigates their ToM robustness using perturbations on false-belief tasks and examines the potential of Chain-of-Thought prompting (CoT) to enhance performance and explain the LLM's decision. We introduce a handcrafted, richly annotated ToM dataset, including classic and perturbed false belief tasks, the corresponding spaces of valid reasoning chains for correct task completion, subsequent reasoning faithfulness, task solutions, and propose metrics to evaluate reasoning chain correctness and to what extent final answers are faithful to reasoning traces of the generated CoT. We show a steep drop in ToM capabilities under task perturbation for all evaluated LLMs, questioning the notion of any robust form of ToM being present. While CoT prompting improves the ToM performance overall in a faithful manner, it surprisingly de...

Read Original Article

[2602.22072] Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models

Summary

Why It Matters

Key Takeaways

Related Articles

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

Claude Code leak exposes a Tamagotchi-style ‘pet’ and an always-on agent | The Verge

You can now use ChatGPT with Apple’s CarPlay | The Verge

No comments

Stay updated with AI News