[2602.21829] StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles

[2602.21829] StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles

arXiv - AI 3 min read Article

Summary

The paper introduces StoryMovie, a dataset designed for aligning visual stories with movie scripts and subtitles, enhancing dialogue attribution and character interactions.

Why It Matters

StoryMovie addresses the challenge of semantic alignment in visual storytelling, which is crucial for improving AI models that generate narratives based on visual inputs. This dataset can significantly enhance the accuracy of dialogue attribution and character dynamics in AI-generated stories, making it relevant for advancements in AI and machine learning applications in entertainment and media.

Key Takeaways

  • StoryMovie consists of 1,757 stories aligned with movie scripts and subtitles.
  • The dataset improves dialogue attribution by linking character names to subtitle timestamps.
  • Fine-tuning on this dataset enhances the performance of visual storytelling models.
  • Evaluation shows a significant improvement in alignment accuracy with the new dataset.
  • Semantic alignment is shown to be more effective than visual grounding alone.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.21829 (cs) [Submitted on 25 Feb 2026] Title:StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles Authors:Daniel Oliveira, David Martins de Matos View a PDF of the paper titled StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles, by Daniel Oliveira and David Martins de Matos View PDF HTML (experimental) Abstract:Visual storytelling models that correctly ground entities in images may still hallucinate semantic relationships, generating incorrect dialogue attribution, character interactions, or emotional states. We introduce StoryMovie, a dataset of 1,757 stories aligned with movie scripts and subtitles through LCS matching. Our alignment pipeline synchronizes screenplay dialogue with subtitle timestamps, enabling dialogue attribution by linking character names from scripts to temporal positions from subtitles. Using this aligned content, we generate stories that maintain visual grounding tags while incorporating authentic character names, dialogue, and relationship dynamics. We fine-tune Qwen Storyteller3 on this dataset, building on prior work in visual grounding and entity re-identification. Evaluation using DeepSeek V3 as judge shows that Storyteller3 achieves an 89.9% win rate against base Qwen2.5-VL 7B on subtitle alignment. Compared to Storyteller, trained without script grounding, Storyteller3 achieves 48.5% versu...

Related Articles

Llms

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

Inspired by Andrej Karpathy's AutoResearch, I built a system where Claude Code acts as an autonomous ML researcher on tabular binary clas...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...

Reddit - Machine Learning · 1 min ·
Llms

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

BraiNN An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning BraiNN is a compact research‑...

Reddit - Machine Learning · 1 min ·
Machine Learning

[HIRING]Remote AI Training Jobs -Up to $1K/Week| Collaborators Wanted.USA

submitted by /u/nortonakenga [link] [comments]

Reddit - ML Jobs · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime