[2603.28925] Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs

[2603.28925] Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.28925: Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs

Computer Science > Computation and Language arXiv:2603.28925 (cs) [Submitted on 30 Mar 2026] Title:Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs Authors:Junsol Kim, Winnie Street, Roberta Rocca, Daine M. Korngiebel, Adam Waytz, James Evans, Geoff Keeling View a PDF of the paper titled Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs, by Junsol Kim and 6 other authors View PDF HTML (experimental) Abstract:Safety fine-tuning in Large Language Models (LLMs) seeks to suppress potentially harmful forms of mind-attribution such as models asserting their own consciousness or claiming to experience emotions. We investigate whether suppressing mind-attribution tendencies degrades intimately related socio-cognitive abilities such as Theory of Mind (ToM). Through safety ablation and mechanistic analyses of representational similarity, we demonstrate that LLM attributions of mind to themselves and to technological artefacts are behaviorally and mechanistically dissociable from ToM capabilities. Nevertheless, safety fine-tuned models under-attribute mind to non-human animals relative to human baselines and are less likely to exhibit spiritual belief, suppressing widely shared perspectives regarding the distribution and nature of non-human minds. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) Cite as: arXiv:2603.28925 [cs.CL]   (or arXiv:2603.28925v1 [cs.CL] for this version)   https://doi.org/10.48550/arX...

Originally published on April 01, 2026. Curated by AI News.

Related Articles

Can Claude Opus 4.7 and Ensemble AI Models Finally Make Code Review Reliable?
Llms

Can Claude Opus 4.7 and Ensemble AI Models Finally Make Code Review Reliable?

Ensemble AI models like Claude Opus 4.7 transform code review reliability. Discover how multi-model approaches catch subtle bugs human re...

AI Tools & Products · 9 min ·
Starbucks Tests AI-Driven Drink Discovery Through ChatGPT Integration |
Llms

Starbucks Tests AI-Driven Drink Discovery Through ChatGPT Integration |

Not long ago, the idea that a customer could describe a mood instead of a menu item and receive a tailored drink recommendation would hav...

AI Tools & Products · 7 min ·
AI XRP Price Prediction: ChatGPT and Claude Predict XRP Price After Hitting $1.45
Llms

AI XRP Price Prediction: ChatGPT and Claude Predict XRP Price After Hitting $1.45

XRP has seen recent gains due to Rakuten listing it as a payment method and Ripple's partnership with Kyobo Life. Bitcoin's rise also con...

AI Tools & Products · 6 min ·
I canceled ChatGPT Plus and 2 other AI subscriptions — here’s what I replaced them with
Llms

I canceled ChatGPT Plus and 2 other AI subscriptions — here’s what I replaced them with

I was paying for Adobe Firefly, ChatGPT Plus, and Perplexity Pro at the same time. Here's why I canceled all three, and what replaced them.

AI Tools & Products · 6 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime