[2602.16826] HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind
Summary
The paper presents HiVAE, a hierarchical variational architecture designed to enhance AI's theory of mind capabilities, enabling better inference of agents' hidden goals in complex environments.
Why It Matters
This research addresses a critical gap in AI's ability to understand and predict human-like behavior in realistic scenarios. By scaling theory of mind reasoning, it has implications for developing more sophisticated AI systems in various applications, from robotics to social interactions.
Key Takeaways
- HiVAE introduces a three-level VAE hierarchy for improved theory of mind reasoning.
- The architecture significantly enhances performance in complex navigation tasks.
- A limitation is identified: learned representations lack grounding in actual mental states.
- Self-supervised alignment strategies are proposed to address this limitation.
- The authors seek community feedback to refine grounding approaches.
Computer Science > Machine Learning arXiv:2602.16826 (cs) [Submitted on 18 Feb 2026] Title:HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind Authors:Nigel Doering, Rahath Malladi, Arshia Sangwan, David Danks, Tauhidur Rahman View a PDF of the paper titled HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind, by Nigel Doering and 4 other authors View PDF HTML (experimental) Abstract:Theory of mind (ToM) enables AI systems to infer agents' hidden goals and mental states, but existing approaches focus mainly on small human understandable gridworld spaces. We introduce HiVAE, a hierarchical variational architecture that scales ToM reasoning to realistic spatiotemporal domains. Inspired by the belief-desire-intention structure of human cognition, our three-level VAE hierarchy achieves substantial performance improvements on a 3,185-node campus navigation task. However, we identify a critical limitation: while our hierarchical structure improves prediction, learned latent representations lack explicit grounding to actual mental states. We propose self-supervised alignment strategies and present this work to solicit community feedback on grounding approaches. Comments: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.16826 [cs.LG] (or arXiv:2602.16826v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.16826 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history ...