[2505.06537] ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
About this article
Abstract page for arXiv paper 2505.06537: ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
Computer Science > Computer Vision and Pattern Recognition arXiv:2505.06537 (cs) [Submitted on 10 May 2025 (v1), last revised 31 Mar 2026 (this version, v2)] Title:ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images Authors:Xianghao Kong, Qiaosong Qi, Yuanbin Wang, Biaolong Chen, Aixi Zhang, Anyi Rao View a PDF of the paper titled ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images, by Xianghao Kong and 5 other authors View PDF HTML (experimental) Abstract:Fashion video generation aims to synthesize temporally consistent videos from reference images of a designated character. Despite significant progress, existing diffusion-based methods only support a single reference image as input, severely limiting their capability to generate view-consistent fashion videos, especially when there are different patterns on the clothes from different perspectives. Moreover, the widely adopted motion module does not sufficiently model human body movement, leading to sub-optimal spatiotemporal consistency. To address these issues, we propose ProFashion, a fashion video generation framework leveraging multiple reference images to achieve improved view consistency and temporal coherency. To effectively leverage features from multiple reference images while maintaining a reasonable computational cost, we devise a Pose-aware Prototype Aggregator, which selects and aggregates global and fine-grained reference features according t...