[2603.25209] Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction
About this article
Abstract page for arXiv paper 2603.25209: Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.25209 (cs) [Submitted on 26 Mar 2026] Title:Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction Authors:Jiahao Tian, Chenxi Song, Wei Cheng, Chi Zhang View a PDF of the paper titled Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction, by Jiahao Tian and 3 other authors View PDF HTML (experimental) Abstract:Generating long videos using pre-trained video diffusion models, which are typically trained on short clips, presents a significant challenge. Directly applying these models for long-video inference often leads to a notable degradation in visual quality. This paper identifies that this issue primarily stems from two out-of-distribution (O.O.D) problems: frame-level relative position O.O.D and context-length O.O.D. To address these challenges, we propose FreeLOC, a novel training-free, layer-adaptive framework that introduces two core techniques: Video-based Relative Position Re-encoding (VRPR) for frame-level relative position O.O.D, a multi-granularity strategy that hierarchically re-encodes temporal relative positions to align with the model's pre-trained distribution, and Tiered Sparse Attention (TSA) for context-length O.O.D, which preserves both local detail and long-range dependencies by structuring attention density across different temporal scales. Crucially, we introduce a layer-adaptive probing mechanism that identifies the sensitivity of each transformer lay...