[2411.19121] MSG Score: Automated Video Verification for Reliable Multi-Scene Generation
About this article
Abstract page for arXiv paper 2411.19121: MSG Score: Automated Video Verification for Reliable Multi-Scene Generation
Computer Science > Computer Vision and Pattern Recognition arXiv:2411.19121 (cs) [Submitted on 28 Nov 2024 (v1), last revised 8 Apr 2026 (this version, v2)] Title:MSG Score: Automated Video Verification for Reliable Multi-Scene Generation Authors:Daewon Yoon, Hyeongseok Lee, Wonsik Shin, Sangyu Han, Nojun Kwak View a PDF of the paper titled MSG Score: Automated Video Verification for Reliable Multi-Scene Generation, by Daewon Yoon and 4 other authors View PDF HTML (experimental) Abstract:While text-to-video diffusion models have advanced significantly, creating coherent long-form content remains unreliable due to stochastic sampling artifacts. This necessitates generating multiple candidates, yet verifying them creates a severe bottleneck; manual review is unscalable, and existing automated metrics lack the adaptability and speed required for runtime monitoring. Another critical issue is the trade-off between evaluation quality and run-time performance: metrics that best capture human-like judgment are often too slow to support iterative generation. These challenges, originating from the lack of an effective evaluation, motivate our work toward a novel solution. To address this, we propose a scalable automated verification framework for long-form video. First, we introduce the MSG(Multi-Scene Generation) score, a hierarchical attention-based metric that adaptively evaluates narrative and visual consistency. This serves as the core verifier within our CGS (Candidate Generat...