[2512.08477] ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Aligned Attention
About this article
Abstract page for arXiv paper 2512.08477: ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Aligned Attention
Computer Science > Computer Vision and Pattern Recognition arXiv:2512.08477 (cs) [Submitted on 9 Dec 2025 (v1), last revised 4 Apr 2026 (this version, v2)] Title:ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Aligned Attention Authors:Huiguo He, Pengyu Yan, Ziqi Yi, Weizhi Zhong, Zheng Liu, Yejun Tang, Huan Yang, Guanbin Li, Lianwen Jin View a PDF of the paper titled ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Aligned Attention, by Huiguo He and 8 other authors View PDF HTML (experimental) Abstract:Drag-based image editing enables intuitive visual manipulation through point-based drag operations. Existing methods mainly rely on diffusion inversion or pixel-space warping with inpainting. However, inversion inherently introduces approximation errors that degrade texture fidelity, whereas rigid pixel-space operations discard semantic context and produce unnatural deformations. To address these issues, we introduce ContextDrag, to our knowledge the first framework that brings drag-based manipulation into the in-context image editing paradigm. By leveraging the in-context capabilities of editing models (e.g., FLUX-Kontext), ContextDrag enables precise drag editing without inversion or fine-tuning. Specifically, we first propose Context-preserving Token Injection (CTI), which injects VAE-encoded reference features into attention layers at spatially aligned target positions, g...