[2602.14237] AbracADDbra: Touch-Guided Object Addition by Decoupling Placement and Editing Subtasks
Summary
The paper presents AbracADDbra, a framework that enhances object addition in computer vision by decoupling placement and editing tasks through touch-guided interactions.
Why It Matters
This research addresses common challenges in object addition by improving user experience and accuracy through intuitive touch inputs. The introduction of the Touch2Add benchmark provides a standardized way to evaluate advancements in this area, making it relevant for future developments in creative tools and applications in computer vision.
Key Takeaways
- AbracADDbra improves object addition by separating placement and editing tasks.
- The framework utilizes touch-guided interactions for better user experience.
- A vision-language transformer and diffusion model enhance accuracy and quality.
- The Touch2Add benchmark allows for standardized evaluation of object addition methods.
- Initial placement accuracy is crucial for achieving high-quality edits.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.14237 (cs) [Submitted on 15 Feb 2026] Title:AbracADDbra: Touch-Guided Object Addition by Decoupling Placement and Editing Subtasks Authors:Kunal Swami, Raghu Chittersu, Yuvraj Rathore, Rajeev Irny, Shashavali Doodekula, Alok Shukla View a PDF of the paper titled AbracADDbra: Touch-Guided Object Addition by Decoupling Placement and Editing Subtasks, by Kunal Swami and 5 other authors View PDF HTML (experimental) Abstract:Instruction-based object addition is often hindered by the ambiguity of text-only prompts or the tedious nature of mask-based inputs. To address this usability gap, we introduce AbracADDbra, a user-friendly framework that leverages intuitive touch priors to spatially ground succinct instructions for precise placement. Our efficient, decoupled architecture uses a vision-language transformer for touch-guided placement, followed by a diffusion model that jointly generates the object and an instance mask for high-fidelity blending. To facilitate standardized evaluation, we contribute the Touch2Add benchmark for this interactive task. Our extensive evaluations, where our placement model significantly outperforms both random placement and general-purpose VLM baselines, confirm the framework's ability to produce high-fidelity edits. Furthermore, our analysis reveals a strong correlation between initial placement accuracy and final edit quality, validating our decoupled approach. This work thus p...