Robotics Ai Agents Machine Learning

[2511.23055] MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents

arXiv - AI February 25, 2026 3 min read Article

Summary

The paper presents MindPower, a framework that enhances embodied agents' decision-making by integrating Theory of Mind (ToM) reasoning, outperforming existing models in action generation.

Why It Matters

This research addresses a significant gap in AI by enabling embodied agents to understand and infer both their own and others' mental states, which is crucial for developing more intelligent and autonomous systems. The introduction of Mind-Reward as an optimization objective further enhances the model's capabilities, making it relevant for advancements in AI applications.

Key Takeaways

MindPower integrates Theory of Mind reasoning into embodied agents.
The framework improves decision-making and action generation by modeling self and others' mental states.
Mind-Reward optimizes the agents' reasoning consistency.
MindPower outperforms GPT-4o in key performance metrics.
This research paves the way for more advanced AI interactions.

Computer Science > Artificial Intelligence arXiv:2511.23055 (cs) [Submitted on 28 Nov 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents Authors:Ruoxuan Zhang, Qiyun Zheng, Zhiyu Zhou, Ziqi Liao, Siyu Wu, Jian-Yu Jiang-Lin, Bin Wen, Hongxia Xie, Jianlong Fu, Wen-Huang Cheng View a PDF of the paper titled MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents, by Ruoxuan Zhang and 9 other authors View PDF HTML (experimental) Abstract:Theory of Mind (ToM) refers to the ability to infer others' mental states, such as beliefs, desires, and intentions. Current vision-language embodied agents lack ToM-based decision-making, and existing benchmarks focus solely on human mental states while ignoring the agent's own perspective, hindering coherent decision and action generation. To address this, we propose MindPower, a Robot-Centric framework integrating Perception, Mental Reasoning, Decision Making and Action. Given multimodal inputs, MindPower first perceives the environment and human states, then performs ToM Reasoning to model both self and others, and finally generates decisions and actions guided by inferred mental states. Furthermore, we introduce Mind-Reward, a novel optimization objective that encourages VLMs to produce consistent ToM Reasoning and behavior. Our model outperforms GPT-4o by 12.77% in decision making and 12.49% in action generation. Comments: Subjects: Art...

Read Original Article

Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min · about 4 hours ago

Machine Learning

[2601.07855] RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

Abstract page for arXiv paper 2601.07855: RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

arXiv - AI · 3 min · about 15 hours ago

Llms

[2502.00262] INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

Abstract page for arXiv paper 2502.00262: INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Ha...

arXiv - AI · 4 min · about 15 hours ago

[2511.23055] MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents

Summary

Why It Matters

Key Takeaways

Related Articles

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

[2601.07855] RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

[2502.00262] INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

No comments

Stay updated with AI News