[2602.24289] Mode Seeking meets Mean Seeking for Fast Long Video Generation

[2602.24289] Mode Seeking meets Mean Seeking for Fast Long Video Generation

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2602.24289: Mode Seeking meets Mean Seeking for Fast Long Video Generation

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.24289 (cs) [Submitted on 27 Feb 2026] Title:Mode Seeking meets Mean Seeking for Fast Long Video Generation Authors:Shengqu Cai, Weili Nie, Chao Liu, Julius Berner, Lvmin Zhang, Nanye Ma, Hansheng Chen, Maneesh Agrawala, Leonidas Guibas, Gordon Wetzstein, Arash Vahdat View a PDF of the paper titled Mode Seeking meets Mean Seeking for Fast Long Video Generation, by Shengqu Cai and 10 other authors View PDF HTML (experimental) Abstract:Scaling video generation from seconds to minutes faces a critical bottleneck: while short-video data is abundant and high-fidelity, coherent long-form data is scarce and limited to narrow domains. To address this, we propose a training paradigm where Mode Seeking meets Mean Seeking, decoupling local fidelity from long-term coherence based on a unified representation via a Decoupled Diffusion Transformer. Our approach utilizes a global Flow Matching head trained via supervised learning on long videos to capture narrative structure, while simultaneously employing a local Distribution Matching head that aligns sliding windows to a frozen short-video teacher via a mode-seeking reverse-KL divergence. This strategy enables the synthesis of minute-scale videos that learns long-range coherence and motions from limited long videos via supervised flow matching, while inheriting local realism by aligning every sliding-window segment of the student to a frozen short-video teacher, result...

Originally published on March 02, 2026. Curated by AI News.

Related Articles

Machine Learning

I tried building a memory-first AI… and ended up discovering smaller models can beat larger ones

Dataset Model Acc F1 Δ vs Log Δ vs Static Avg Params Peak Params Steps Infer ms Size Banking77-20 Logistic TF-IDF 92.37% 0.9230 +0.00pp +...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] Howcome Muon is only being used for Transformers?

Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets tu...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] Run Karpathy's Autoresearch for $0.44 instead of $24 — Open-source parallel evolution pipeline on SageMaker Spot

TL;DR: I built an open-source pipeline that runs Karpathy's autoresearch on SageMaker Spot instances — 25 autonomous ML experiments for $...

Reddit - Machine Learning · 1 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime