[2508.06869] VSI: Visual Subtitle Integration for Keyframe Selection

[2508.06869] VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding

arXiv - AI April 13, 2026 4 min read

About this article

Abstract page for arXiv paper 2508.06869: VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding

Computer Science > Computer Vision and Pattern Recognition arXiv:2508.06869 (cs) [Submitted on 9 Aug 2025 (v1), last revised 10 Apr 2026 (this version, v4)] Title:VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding Authors:Jianxiang He, Meisheng Hong, Jungang Li, Weiyu Guo, Xuming Hu, Hui Xiong View a PDF of the paper titled VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding, by Jianxiang He and 5 other authors View PDF HTML (experimental) Abstract:Multimodal large language models (MLLMs) demonstrate exceptional performance in vision-language tasks, yet their processing of long videos is constrained by input context length and high computational costs. Sparse frame sampling thus becomes a necessary preprocessing step, with sampled frame quality directly impacting downstream performance. Existing keyframe search algorithms achieve a balance between efficiency and sampled frame quality but heavily rely on the visual modality alone. This makes them difficult to adapt to text-related tasks and often leads to retrieval results deviating from core semantic content. To address this, we propose the VISUAL-SUBTITLE INTEGRATION (VSI), a multimodal keyframe retrieval framework. It employs a dual-branch collaborative retrieval approach combining Video Search and Subtitle Match to fuse complementary visual and textual information for precise localization. Experiments on LongVideoBench and VideoMME demonstr...

Originally published on April 13, 2026. Curated by AI News.

Llms

I am not an "anti" like this guy, but still an interesting video of person interacting with chat 4o

(Posting Here because removed by Chatgpt Complaints moderators because the model here is 4o, and refuse to believe there were any safety ...

Reddit - Artificial Intelligence · 1 min · 4 minutes ago

Llms

We built a way for two people's AI context to talk to each other (without sharing their conversations)

We've been thinking about how we use AI in our relationships. Big part of it is about other people. Talking about them, figuring out what...

Reddit - Artificial Intelligence · 1 min · 4 minutes ago

Llms

No flattery please, Claude: I’m British | Brief letters

AI Tools & Products · 2 min · 36 minutes ago

Llms

Unsolved AI Mystery Is Solved Along With Lessons Learned On Why ChatGPT Became Oddly Obsessed With Gremlins And Goblins

This article discusses the resolution of an AI mystery regarding ChatGPT's unusual focus on gremlins and goblins, along with insights gai...

AI Tools & Products · 1 min · 36 minutes ago

[2508.06869] VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding

About this article

Related Articles

I am not an "anti" like this guy, but still an interesting video of person interacting with chat 4o

We built a way for two people's AI context to talk to each other (without sharing their conversations)

No flattery please, Claude: I’m British | Brief letters

Unsolved AI Mystery Is Solved Along With Lessons Learned On Why ChatGPT Became Oddly Obsessed With Gremlins And Goblins

No comments

Stay updated with AI News