[2601.23232] ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search

[2601.23232] ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search

arXiv - AI 4 min read Article

Summary

ShotFinder introduces a novel benchmark for open-domain video shot retrieval, utilizing LLMs to enhance video search capabilities through imaginative query expansion and controlled retrieval processes.

Why It Matters

As video content proliferates, effective retrieval methods are essential for users to find relevant clips quickly. ShotFinder addresses existing gaps in video retrieval by formalizing editing requirements and providing a structured approach to enhance search engine capabilities, which is crucial for both academic research and practical applications in media.

Key Takeaways

  • ShotFinder formalizes video shot retrieval with keyframe-oriented descriptions.
  • It introduces five controllable constraints for improved retrieval accuracy.
  • Experiments reveal significant performance gaps compared to human capabilities.
  • Temporal localization is more manageable than color and visual style retrieval.
  • The benchmark aims to advance multimodal large models in video search tasks.

Computer Science > Computer Vision and Pattern Recognition arXiv:2601.23232 (cs) [Submitted on 30 Jan 2026 (v1), last revised 14 Feb 2026 (this version, v3)] Title:ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search Authors:Tao Yu, Haopeng Jin, Hao Wang, Shenghua Chai, Yujia Yang, Junhao Gong, Jiaming Guo, Minghui Zhang, Xinlong Chen, Zhenghao Zhang, Yuxuan Zhou, Yufei Xiong, Shanbin Zhang, Jiabing Yang, Hongzhu Yi, Xinming Wang, Cheng Zhong, Xiao Ma, Zhang Zhang, Yan Huang, Liang Wang View a PDF of the paper titled ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search, by Tao Yu and 20 other authors View PDF HTML (experimental) Abstract:In recent years, large language models (LLMs) have made rapid progress in information retrieval, yet existing research has mainly focused on text or static multimodal settings. Open-domain video shot retrieval, which involves richer temporal structure and more complex semantics, still lacks systematic benchmarks and analysis. To fill this gap, we introduce ShotFinder, a benchmark that formalizes editing requirements as keyframe-oriented shot descriptions and introduces five types of controllable single-factor constraints: Temporal order, Color, Visual style, Audio, and Resolution. We curate 1,210 high-quality samples from YouTube across 20 thematic categories, using large models for generation with human verification. Based on the benchmark, we propose ShotFinder, a text-driven three-stag...

Related Articles

Llms

My AI spent last night modifying its own codebase

I've been working on a local AI system called Apis that runs completely offline through Ollama. During a background run, Apis identified ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Fake users generated by AI can't simulate humans — review of 182 research papers. Your thoughts?

https://www.researchsquare.com/article/rs-9057643/v1 There’s a massive trend right now where tech companies, businesses, even researchers...

Reddit - Artificial Intelligence · 1 min ·
Llms

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

TL;DR: Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — a...

Reddit - Artificial Intelligence · 1 min ·
[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage
Llms

[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

Abstract page for arXiv paper 2603.23966: Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime