[2602.08277] PISCO: Precise Video Instance Insertion with Sparse Control
About this article
Abstract page for arXiv paper 2602.08277: PISCO: Precise Video Instance Insertion with Sparse Control
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.08277 (cs) [Submitted on 9 Feb 2026 (v1), last revised 27 Mar 2026 (this version, v2)] Title:PISCO: Precise Video Instance Insertion with Sparse Control Authors:Xiangbo Gao, Renjie Li, Xinghao Chen, Yuheng Wu, Suofei Feng, Qing Yin, Zhengzhong Tu View a PDF of the paper titled PISCO: Precise Video Instance Insertion with Sparse Control, by Xiangbo Gao and 6 other authors View PDF HTML (experimental) Abstract:The landscape of AI video generation is undergoing a pivotal shift: moving beyond general generation - which relies on exhaustive prompt-engineering and "cherry-picking" - towards fine-grained, controllable generation and high-fidelity post-processing. In professional AI-assisted filmmaking, it is crucial to perform precise, targeted modifications. A cornerstone of this transition is video instance insertion, which requires inserting a specific instance into existing footage while maintaining scene integrity. Unlike traditional video editing, this task demands several requirements: precise spatial-temporal placement, physically consistent scene interaction, and the faithful preservation of original dynamics - all achieved under minimal user effort. In this paper, we propose PISCO, a video diffusion model for precise video instance insertion with arbitrary sparse keyframe control. PISCO allows users to specify a single keyframe, start-and-end keyframes, or sparse keyframes at arbitrary timestamps, and...