[2601.22228] Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation

[2601.22228] Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2601.22228: Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation

Computer Science > Computer Vision and Pattern Recognition arXiv:2601.22228 (cs) [Submitted on 29 Jan 2026 (v1), last revised 29 Apr 2026 (this version, v2)] Title:Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation Authors:Ken Deng, Yifu Qiu, Yoni Kasten, Shay B. Cohen, Yftah Ziser View a PDF of the paper titled Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation, by Ken Deng and 4 other authors View PDF HTML (experimental) Abstract:We study whether vision-language models (VLMs) can solve relative camera pose estimation (RCPE) from image pairs, a direct test of multi-view spatial reasoning. We cast RCPE as a discrete verbal classification task and introduce \texttt{VRRPI-Bench}, built from real RGB-D frames with object-centric camera motion, and \texttt{VRRPI-Diag}, which isolates individual motion degrees of freedom. Humans (0.91) and specialized geometric pipelines such as LoFTR (0.99) solve the task reliably, yet the best VLM reaches only 0.66 and most others remain near random. Our analyses show that this gap is not basic spatial competence: strong VLMs are near ceiling on single-image benchmarks, but most remain near random once reasoning must span views. They are unstable under source-target reversal (best 59.7\% consistency) and remain weak even in simplified single-DoF settings, especially on optical-axis motions such as roll and depth translation (GPT-5: 0.46 on roll). These failures are useful: they...

Originally published on May 01, 2026. Curated by AI News.

Related Articles

[2604.17460] Agentic Education: Using Claude Code to Teach Claude Code
Llms

[2604.17460] Agentic Education: Using Claude Code to Teach Claude Code

Abstract page for arXiv paper 2604.17460: Agentic Education: Using Claude Code to Teach Claude Code

arXiv - AI · 4 min ·
[2603.09117] Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards
Llms

[2603.09117] Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

Abstract page for arXiv paper 2603.09117: Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Ve...

arXiv - AI · 3 min ·
[2602.10140] Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study
Llms

[2602.10140] Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study

Abstract page for arXiv paper 2602.10140: Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study

arXiv - AI · 4 min ·
[2601.14289] RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension
Llms

[2601.14289] RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

Abstract page for arXiv paper 2601.14289: RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

arXiv - AI · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime