[2511.01266] MotionStream: Real-Time Video Generation with Interactive Motion Controls
About this article
Abstract page for arXiv paper 2511.01266: MotionStream: Real-Time Video Generation with Interactive Motion Controls
Computer Science > Computer Vision and Pattern Recognition arXiv:2511.01266 (cs) [Submitted on 3 Nov 2025 (v1), last revised 1 Mar 2026 (this version, v3)] Title:MotionStream: Real-Time Video Generation with Interactive Motion Controls Authors:Joonghyuk Shin, Zhengqi Li, Richard Zhang, Jun-Yan Zhu, Jaesik Park, Eli Shechtman, Xun Huang View a PDF of the paper titled MotionStream: Real-Time Video Generation with Interactive Motion Controls, by Joonghyuk Shin and 6 other authors View PDF HTML (experimental) Abstract:Current motion-conditioned video generation methods suffer from prohibitive latency (minutes per video) and non-causal processing that prevents real-time interaction. We present MotionStream, enabling sub-second latency with up to 29 FPS streaming generation on a single GPU. Our approach begins by augmenting a text-to-video model with motion control, which generates high-quality videos that adhere to the global text prompt and local motion guidance, but does not perform inference on the fly. As such, we distill this bidirectional teacher into a causal student through Self Forcing with Distribution Matching Distillation, enabling real-time streaming inference. Several key challenges arise when generating videos of long, potentially infinite time-horizons -- (1) bridging the domain gap from training on finite length and extrapolating to infinite horizons, (2) sustaining high quality by preventing error accumulation, and (3) maintaining fast inference, without incur...