[2602.00181] CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning
About this article
Abstract page for arXiv paper 2602.00181: CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.00181 (cs) [Submitted on 30 Jan 2026 (v1), last revised 14 Apr 2026 (this version, v3)] Title:CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning Authors:Hang Wu, Yujun Cai, Zehao Li, Haonan Ge, Bowen Sun, Junsong Yuan, Yiwei Wang View a PDF of the paper titled CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning, by Hang Wu and 6 other authors View PDF HTML (experimental) Abstract:Understanding camera dynamics is a fundamental pillar of video spatial intelligence. However, existing multimodal models predominantly treat this task as a black-box classification, often confusing physically distinct motions by relying on superficial visual patterns rather than geometric cues. We present \textbf{CamReasoner}, a framework that reformulates camera movement understanding as a structured inference process to bridge the gap between perception and cinematic logic. Our approach centers on the Observation-Thinking-Answer (O-T-A) paradigm, which compels the model to articulate spatio-temporal observations and reason about motion patterns within an explicit reasoning block. To instill this capability, we construct a Large-scale Inference Trajectory Suite comprising 18k SFT reasoning chains and 38k RL feedback samples. To the best of our knowledge, \textbf{we are the first to employ RL for logical alignment in camera movement understanding}, ensuring...