[2602.15892] Egocentric Bias in Vision-Language Models

[2602.15892] Egocentric Bias in Vision-Language Models

arXiv - AI 3 min read Article

Summary

The paper introduces FlipSet, a benchmark for assessing visual perspective taking in vision-language models, revealing significant egocentric bias in their performance.

Why It Matters

Understanding egocentric bias in vision-language models is crucial for improving AI's social cognition and spatial reasoning capabilities. The findings highlight limitations in current models, which could inform future research and development in AI systems that require perspective-taking abilities.

Key Takeaways

  • FlipSet benchmark assesses Level-2 visual perspective taking in VLMs.
  • Most vision-language models exhibit systematic egocentric bias.
  • Models perform well in isolation but fail in integrated tasks requiring spatial reasoning.
  • Current VLMs lack mechanisms to bind social awareness with spatial operations.
  • The study provides insights for improving AI's cognitive capabilities.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.15892 (cs) [Submitted on 10 Feb 2026] Title:Egocentric Bias in Vision-Language Models Authors:Maijunxian Wang, Yijiang Li, Bingyang Wang, Tianwei Zhao, Ran Ji, Qingying Gao, Emmy Liu, Hokin Deng, Dezhi Luo View a PDF of the paper titled Egocentric Bias in Vision-Language Models, by Maijunxian Wang and 8 other authors View PDF HTML (experimental) Abstract:Visual perspective taking--inferring how the world appears from another's viewpoint--is foundational to social cognition. We introduce FlipSet, a diagnostic benchmark for Level-2 visual perspective taking (L2 VPT) in vision-language models. The task requires simulating 180-degree rotations of 2D character strings from another agent's perspective, isolating spatial transformation from 3D scene complexity. Evaluating 103 VLMs reveals systematic egocentric bias: the vast majority perform below chance, with roughly three-quarters of errors reproducing the camera viewpoint. Control experiments expose a compositional deficit--models achieve high theory-of-mind accuracy and above-chance mental rotation in isolation, yet fail catastrophically when integration is required. This dissociation indicates that current VLMs lack the mechanisms needed to bind social awareness to spatial operations, suggesting fundamental limitations in model-based spatial reasoning. FlipSet provides a cognitively grounded testbed for diagnosing perspective-taking capabilities in multimo...

Related Articles

Llms

[D] Howcome Muon is only being used for Transformers?

Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets tu...

Reddit - Machine Learning · 1 min ·
Llms

[P] I trained a language model from scratch for a low resource language and got it running fully on-device on Android (no GPU, demo)

Hi Everybody! I just wanted to share an update on a project I’ve been working on called BULaMU, a family of language models trained (20M,...

Reddit - Machine Learning · 1 min ·
Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users
Llms

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

A study found that sycophancy is pervasive among chatbots, and that bots are more likely than human peers to affirm a person's bad behavior.

AI Tools & Products · 6 min ·
Popular AI gateway startup LiteLLM ditches controversial startup Delve | TechCrunch
Llms

Popular AI gateway startup LiteLLM ditches controversial startup Delve | TechCrunch

LiteLLM had obtained two security compliance certifications via Delve and fell victim to some horrific credential-stealing malware last w...

TechCrunch - AI · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime