[2603.18856] Motion-o: Trajectory-Grounded Video Reasoning

[2603.18856] Motion-o: Trajectory-Grounded Video Reasoning

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.18856: Motion-o: Trajectory-Grounded Video Reasoning

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.18856 (cs) [Submitted on 19 Mar 2026 (v1), last revised 7 May 2026 (this version, v2)] Title:Motion-o: Trajectory-Grounded Video Reasoning Authors:Bishoy Galoaa, Shayda Moezzi, Xiangyu Bai, Sarah Ostadabbas View a PDF of the paper titled Motion-o: Trajectory-Grounded Video Reasoning, by Bishoy Galoaa and 3 other authors View PDF HTML (experimental) Abstract:Recent video reasoning models increasingly produce spatio-temporal evidence chains that localize objects at specific timestamps. While these traces improve interpretability by grounding \emph{where} and \emph{when} evidence appears, they often leave the motion connecting observations, the \textit{how}, implicit. This makes dynamic and trajectory-dependent claims difficult to supervise, verify, or penalize when unsupported by the video. We formalize this missing component as Spatial-Temporal-Trajectory (STT) reasoning and introduce \textbf{Motion-o}, a motion-centric extension to vision-language models (VLMs) that makes trajectories explicit and verifiable. Motion-o augments evidence chains with Motion Chain of Thought (MCoT), a structured pathway that represents object motion through a discrete \texttt{<motion/>} tag summarizing direction, speed, and scale change. To supervise MCoT, we densify sparse spatio-temporal annotations into object tracks and derive motion descriptors from centroid displacement and box-area change. We then train with complemen...

Originally published on May 11, 2026. Curated by AI News.

Related Articles

Machine Learning

What to expect from AlphaZero's value predictions [D]

An AlphaZero agent has learnt to predict the value of a game state by training on data generated by self-play by the model and a series o...

Reddit - Machine Learning · 1 min ·
Machine Learning

Open Source Projects related to CNNs to Contribute To? [D]

Around a decade a go I was tinkering a lot with CNNs for real time event detection. I enjoyed that a lot and always wanted to get back in...

Reddit - Machine Learning · 1 min ·
I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI | WIRED
Machine Learning

I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI | WIRED

For screenwriters like me—and job seekers all over—AI gig work is the new waiting tables. In eight months, I’ve done 20 of these soul-cru...

Wired - AI · 27 min ·
Machine Learning

Are Enterprises Using AI in the Wrong Places?

Most enterprise AI discussions still revolve around one question: But I’m starting to think that may be the wrong question entirely. The ...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime