[2511.23055] MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents

[2511.23055] MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents

arXiv - AI 3 min read Article

Summary

The paper presents MindPower, a framework that enhances embodied agents' decision-making by integrating Theory of Mind (ToM) reasoning, outperforming existing models in action generation.

Why It Matters

This research addresses a significant gap in AI by enabling embodied agents to understand and infer both their own and others' mental states, which is crucial for developing more intelligent and autonomous systems. The introduction of Mind-Reward as an optimization objective further enhances the model's capabilities, making it relevant for advancements in AI applications.

Key Takeaways

  • MindPower integrates Theory of Mind reasoning into embodied agents.
  • The framework improves decision-making and action generation by modeling self and others' mental states.
  • Mind-Reward optimizes the agents' reasoning consistency.
  • MindPower outperforms GPT-4o in key performance metrics.
  • This research paves the way for more advanced AI interactions.

Computer Science > Artificial Intelligence arXiv:2511.23055 (cs) [Submitted on 28 Nov 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents Authors:Ruoxuan Zhang, Qiyun Zheng, Zhiyu Zhou, Ziqi Liao, Siyu Wu, Jian-Yu Jiang-Lin, Bin Wen, Hongxia Xie, Jianlong Fu, Wen-Huang Cheng View a PDF of the paper titled MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents, by Ruoxuan Zhang and 9 other authors View PDF HTML (experimental) Abstract:Theory of Mind (ToM) refers to the ability to infer others' mental states, such as beliefs, desires, and intentions. Current vision-language embodied agents lack ToM-based decision-making, and existing benchmarks focus solely on human mental states while ignoring the agent's own perspective, hindering coherent decision and action generation. To address this, we propose MindPower, a Robot-Centric framework integrating Perception, Mental Reasoning, Decision Making and Action. Given multimodal inputs, MindPower first perceives the environment and human states, then performs ToM Reasoning to model both self and others, and finally generates decisions and actions guided by inferred mental states. Furthermore, we introduce Mind-Reward, a novel optimization objective that encourages VLMs to produce consistent ToM Reasoning and behavior. Our model outperforms GPT-4o by 12.77% in decision making and 12.49% in action generation. Comments: Subjects: Art...

Related Articles

Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min ·
[2601.07855] RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution
Machine Learning

[2601.07855] RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

Abstract page for arXiv paper 2601.07855: RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

arXiv - AI · 3 min ·
[2502.00262] INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation
Llms

[2502.00262] INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

Abstract page for arXiv paper 2502.00262: INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Ha...

arXiv - AI · 4 min ·
More in Robotics: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime