[2510.02253] DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing
About this article
Abstract page for arXiv paper 2510.02253: DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing
Computer Science > Computer Vision and Pattern Recognition arXiv:2510.02253 (cs) [Submitted on 2 Oct 2025 (v1), last revised 1 Mar 2026 (this version, v3)] Title:DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing Authors:Zihan Zhou, Shilin Lu, Shuli Leng, Shaocong Zhang, Zhuming Lian, Xinlei Yu, Adams Wai-Kin Kong View a PDF of the paper titled DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing, by Zihan Zhou and 6 other authors View PDF HTML (experimental) Abstract:Drag-based image editing has long suffered from distortions in the target region, largely because the priors of earlier base models, Stable Diffusion, are insufficient to project optimized latents back onto the natural image manifold. With the shift from UNet-based DDPMs to more scalable DiT with flow matching (e.g., SD3.5, FLUX), generative priors have become significantly stronger, enabling advances across diverse editing tasks. However, drag-based editing has yet to benefit from these stronger priors. This work proposes the first framework to effectively harness FLUX's rich prior for drag-based editing, dubbed DragFlow, achieving substantial gains over baselines. We first show that directly applying point-based drag editing to DiTs performs poorly: unlike the highly compressed features of UNets, DiT features are insufficiently structured to provide reliable guidance for point-wise motion supervision. To overcome this limitation, DragFlow introduces a re...