[2603.04291] CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video
About this article
Abstract page for arXiv paper 2603.04291: CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.04291 (cs) [Submitted on 4 Mar 2026] Title:CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video Authors:Lingen Li, Guangzhi Wang, Xiaoyu Li, Zhaoyang Zhang, Qi Dou, Jinwei Gu, Tianfan Xue, Ying Shan View a PDF of the paper titled CubeComposer: Spatio-Temporal Autoregressive 4K 360{\deg} Video Generation from Perspective Video, by Lingen Li and 7 other authors View PDF HTML (experimental) Abstract:Generating high-quality 360° panoramic videos from perspective input is one of the crucial applications for virtual reality (VR), whereby high-resolution videos are especially important for immersive experience. Existing methods are constrained by computational limitations of vanilla diffusion models, only supporting $\leq$ 1K resolution native generation and relying on suboptimal post super-resolution to increase resolution. We introduce CubeComposer, a novel spatio-temporal autoregressive diffusion model that natively generates 4K-resolution 360° videos. By decomposing videos into cubemap representations with six faces, CubeComposer autoregressively synthesizes content in a well-planned spatio-temporal order, reducing memory demands while enabling high-resolution output. Specifically, to address challenges in multi-dimensional autoregression, we propose: (1) a spatio-temporal autoregressive strategy that orchestrates 360° video generation across cube faces and time windo...