[2602.07801] VideoTemp-o3: Harmonizing Temporal Grounding and Video

[2602.07801] VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos

arXiv - AI March 04, 2026 4 min read

About this article

Abstract page for arXiv paper 2602.07801: VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.07801 (cs) [Submitted on 8 Feb 2026 (v1), last revised 3 Mar 2026 (this version, v2)] Title:VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos Authors:Wenqi Liu, Yunxiao Wang, Shijie Ma, Meng Liu, Qile Su, Tianke Zhang, Haonan Fan, Changyi Liu, Kaiyu Jiang, Jiankang Chen, Kaiyu Tang, Bin Wen, Fan Yang, Tingting Gao, Han Li, Yinwei Wei, Xuemeng Song View a PDF of the paper titled VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos, by Wenqi Liu and 16 other authors View PDF HTML (experimental) Abstract:In long-video understanding, conventional uniform frame sampling often fails to capture key visual evidence, leading to degraded performance and increased hallucinations. To address this, recent agentic thinking-with-videos paradigms have emerged, adopting a localize-clip-answer pipeline in which the model actively identifies relevant video segments, performs dense sampling within those clips, and then produces answers. However, existing methods remain inefficient, suffer from weak localization, and adhere to rigid workflows. To solve these issues, we propose VideoTemp-o3, a unified agentic thinking-with-videos framework that jointly models video grounding and question answering. VideoTemp-o3 exhibits strong localization capability, supports on-demand clipping, and can refine inaccurate localizations. Specific...

Originally published on March 04, 2026. Curated by AI News.

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · 23 minutes ago

Machine Learning

[2603.23899] SM-Net: Learning a Continuous Spectral Manifold from Multiple Stellar Libraries

Abstract page for arXiv paper 2603.23899: SM-Net: Learning a Continuous Spectral Manifold from Multiple Stellar Libraries

arXiv - AI · 4 min · 36 minutes ago

Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min · 36 minutes ago

Llms

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Abstract page for arXiv paper 2603.15159: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

arXiv - AI · 4 min · 36 minutes ago

[2602.07801] VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos

About this article

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence

[2603.23899] SM-Net: Learning a Continuous Spectral Manifold from Multiple Stellar Libraries

[2603.16629] MLLM-based Textual Explanations for Face Comparison

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

No comments

Stay updated with AI News