[2601.22228] Lost in Space? Vision-Language Models Struggle with

[2601.22228] Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation

arXiv - AI May 01, 2026 4 min read

About this article

Abstract page for arXiv paper 2601.22228: Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation

Computer Science > Computer Vision and Pattern Recognition arXiv:2601.22228 (cs) [Submitted on 29 Jan 2026 (v1), last revised 29 Apr 2026 (this version, v2)] Title:Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation Authors:Ken Deng, Yifu Qiu, Yoni Kasten, Shay B. Cohen, Yftah Ziser View a PDF of the paper titled Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation, by Ken Deng and 4 other authors View PDF HTML (experimental) Abstract:We study whether vision-language models (VLMs) can solve relative camera pose estimation (RCPE) from image pairs, a direct test of multi-view spatial reasoning. We cast RCPE as a discrete verbal classification task and introduce \texttt{VRRPI-Bench}, built from real RGB-D frames with object-centric camera motion, and \texttt{VRRPI-Diag}, which isolates individual motion degrees of freedom. Humans (0.91) and specialized geometric pipelines such as LoFTR (0.99) solve the task reliably, yet the best VLM reaches only 0.66 and most others remain near random. Our analyses show that this gap is not basic spatial competence: strong VLMs are near ceiling on single-image benchmarks, but most remain near random once reasoning must span views. They are unstable under source-target reversal (best 59.7\% consistency) and remain weak even in simplified single-DoF settings, especially on optical-axis motions such as roll and depth translation (GPT-5: 0.46 on roll). These failures are useful: they...

Originally published on May 01, 2026. Curated by AI News.

Llms

[2604.17460] Agentic Education: Using Claude Code to Teach Claude Code

Abstract page for arXiv paper 2604.17460: Agentic Education: Using Claude Code to Teach Claude Code

arXiv - AI · 4 min · about 3 hours ago

Llms

[2603.09117] Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

Abstract page for arXiv paper 2603.09117: Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Ve...

arXiv - AI · 3 min · about 3 hours ago

Llms

[2602.10140] Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study

Abstract page for arXiv paper 2602.10140: Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study

arXiv - AI · 4 min · about 3 hours ago

Llms

[2601.14289] RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

Abstract page for arXiv paper 2601.14289: RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

arXiv - AI · 3 min · about 3 hours ago

[2601.22228] Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation

About this article

Related Articles

[2604.17460] Agentic Education: Using Claude Code to Teach Claude Code

[2603.09117] Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

[2602.10140] Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study

[2601.14289] RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

No comments

Stay updated with AI News