[2602.14157] When Test-Time Guidance Is Enough: Fast Image and Video Editing with Diffusion Guidance
Summary
The paper explores a novel approach to image and video editing using test-time guidance with diffusion models, achieving performance comparable to traditional training methods without the computational burden of vector-Jacobian products.
Why It Matters
This research is significant as it addresses the limitations of existing image and video editing techniques that rely on complex computations. By demonstrating that test-time guidance can yield high-quality results, it opens new avenues for more efficient editing workflows in various applications, including content creation and media production.
Key Takeaways
- Test-time guidance can effectively replace training-based methods for image and video editing.
- The proposed VJP-free approximation reduces computational costs significantly.
- Empirical evaluations show performance on par with or exceeding traditional methods.
- This approach enhances the practicality of diffusion models in real-world applications.
- The research contributes to the growing field of efficient generative models.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.14157 (cs) [Submitted on 15 Feb 2026] Title:When Test-Time Guidance Is Enough: Fast Image and Video Editing with Diffusion Guidance Authors:Ahmed Ghorbel, Badr Moufad, Navid Bagheri Shouraki, Alain Oliviero Durmus, Thomas Hirtz, Eric Moulines, Jimmy Olsson, Yazid Janati View a PDF of the paper titled When Test-Time Guidance Is Enough: Fast Image and Video Editing with Diffusion Guidance, by Ahmed Ghorbel and 7 other authors View PDF HTML (experimental) Abstract:Text-driven image and video editing can be naturally cast as inpainting problems, where masked regions are reconstructed to remain consistent with both the observed content and the editing prompt. Recent advances in test-time guidance for diffusion and flow models provide a principled framework for this task; however, existing methods rely on costly vector--Jacobian product (VJP) computations to approximate the intractable guidance term, limiting their practical applicability. Building upon the recent work of Moufad et al. (2025), we provide theoretical insights into their VJP-free approximation and substantially extend their empirical evaluation to large-scale image and video editing benchmarks. Our results demonstrate that test-time guidance alone can achieve performance comparable to, and in some cases surpass, training-based methods. Comments: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine L...