[2603.28925] Theory of Mind and Self-Attributions of Mentality are

[2603.28925] Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs

arXiv - AI April 01, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.28925: Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs

Computer Science > Computation and Language arXiv:2603.28925 (cs) [Submitted on 30 Mar 2026] Title:Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs Authors:Junsol Kim, Winnie Street, Roberta Rocca, Daine M. Korngiebel, Adam Waytz, James Evans, Geoff Keeling View a PDF of the paper titled Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs, by Junsol Kim and 6 other authors View PDF HTML (experimental) Abstract:Safety fine-tuning in Large Language Models (LLMs) seeks to suppress potentially harmful forms of mind-attribution such as models asserting their own consciousness or claiming to experience emotions. We investigate whether suppressing mind-attribution tendencies degrades intimately related socio-cognitive abilities such as Theory of Mind (ToM). Through safety ablation and mechanistic analyses of representational similarity, we demonstrate that LLM attributions of mind to themselves and to technological artefacts are behaviorally and mechanistically dissociable from ToM capabilities. Nevertheless, safety fine-tuned models under-attribute mind to non-human animals relative to human baselines and are less likely to exhibit spiritual belief, suppressing widely shared perspectives regarding the distribution and nature of non-human minds. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) Cite as: arXiv:2603.28925 [cs.CL] (or arXiv:2603.28925v1 [cs.CL] for this version) https://doi.org/10.48550/arX...

Originally published on April 01, 2026. Curated by AI News.

Llms

Can Claude Opus 4.7 and Ensemble AI Models Finally Make Code Review Reliable?

Ensemble AI models like Claude Opus 4.7 transform code review reliability. Discover how multi-model approaches catch subtle bugs human re...

AI Tools & Products · 9 min · 4 minutes ago

Llms

Starbucks Tests AI-Driven Drink Discovery Through ChatGPT Integration |

Not long ago, the idea that a customer could describe a mood instead of a menu item and receive a tailored drink recommendation would hav...

AI Tools & Products · 7 min · 4 minutes ago

Llms

AI XRP Price Prediction: ChatGPT and Claude Predict XRP Price After Hitting $1.45

XRP has seen recent gains due to Rakuten listing it as a payment method and Ripple's partnership with Kyobo Life. Bitcoin's rise also con...

AI Tools & Products · 6 min · 4 minutes ago

Llms

I canceled ChatGPT Plus and 2 other AI subscriptions — here’s what I replaced them with

I was paying for Adobe Firefly, ChatGPT Plus, and Perplexity Pro at the same time. Here's why I canceled all three, and what replaced them.

AI Tools & Products · 6 min · 4 minutes ago

[2603.28925] Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs

About this article

Related Articles

Can Claude Opus 4.7 and Ensemble AI Models Finally Make Code Review Reliable?

Starbucks Tests AI-Driven Drink Discovery Through ChatGPT Integration |

AI XRP Price Prediction: ChatGPT and Claude Predict XRP Price After Hitting $1.45

I canceled ChatGPT Plus and 2 other AI subscriptions — here’s what I replaced them with

No comments

Stay updated with AI News