[2602.23228] MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction
Summary
The paper presents MovieTeller, a novel framework for generating movie synopses using tool-augmented progressive abstraction to enhance character consistency and narrative coherence in automated video summarization.
Why It Matters
As digital entertainment grows, effective automated video summarization becomes crucial for indexing and recommendations. MovieTeller addresses limitations in existing models, improving factual accuracy and narrative coherence, which is vital for content creators and consumers alike.
Key Takeaways
- MovieTeller enhances movie synopsis generation through tool-augmented methods.
- The framework avoids costly model fine-tuning by using off-the-shelf models.
- It improves character identification and narrative coherence in long-form videos.
- Progressive abstraction helps manage context length limitations of current models.
- Experiments show significant improvements over traditional end-to-end approaches.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.23228 (cs) [Submitted on 26 Feb 2026] Title:MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction Authors:Yizhi Li, Xiaohan Chen, Miao Jiang, Wentao Tang, Gaoang Wang View a PDF of the paper titled MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction, by Yizhi Li and 3 other authors View PDF HTML (experimental) Abstract:With the explosive growth of digital entertainment, automated video summarization has become indispensable for applications such as content indexing, personalized recommendation, and efficient media archiving. Automatic synopsis generation for long-form videos, such as movies and TV series, presents a significant challenge for existing Vision-Language Models (VLMs). While proficient at single-image captioning, these general-purpose models often exhibit critical failures in long-duration contexts, primarily a lack of ID-consistent character identification and a fractured narrative coherence. To overcome these limitations, we propose MovieTeller, a novel framework for generating movie synopses via tool-augmented progressive abstraction. Our core contribution is a training-free, tool-augmented, fact-grounded generation process. Instead of requiring costly model fine-tuning, our framework directly leverages off-the-shelf models in a plug-and-play manner. We first invoke a specialized face recognition model as an external "tool" ...