[2603.02175] Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

[2603.02175] Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.02175: Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.02175 (cs) [Submitted on 2 Mar 2026] Title:Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance Authors:Yiqi Lin, Guoqiang Liang, Ziyun Zeng, Zechen Bai, Yanzhe Chen, Mike Zheng Shou View a PDF of the paper titled Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance, by Yiqi Lin and 5 other authors View PDF HTML (experimental) Abstract:Instruction-based video editing has witnessed rapid progress, yet current methods often struggle with precise visual control, as natural language is inherently limited in describing complex visual nuances. Although reference-guided editing offers a robust solution, its potential is currently bottlenecked by the scarcity of high-quality paired training data. To bridge this gap, we introduce a scalable data generation pipeline that transforms existing video editing pairs into high-fidelity training quadruplets, leveraging image generative models to create synthesized reference scaffolds. Using this pipeline, we construct RefVIE, a large-scale dataset tailored for instruction-reference-following tasks, and establish RefVIE-Bench for comprehensive evaluation. Furthermore, we propose a unified editing architecture, Kiwi-Edit, that synergizes learnable queries and latent visual features for reference semantic guidance. Our model achieves significant gains in instruction following and reference fidelity via a progressive multi-stage train...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

[2603.14841] Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling
Machine Learning

[2603.14841] Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

Abstract page for arXiv paper 2603.14841: Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

arXiv - AI · 4 min ·
[2603.17839] How do LLMs Compute Verbal Confidence
Llms

[2603.17839] How do LLMs Compute Verbal Confidence

Abstract page for arXiv paper 2603.17839: How do LLMs Compute Verbal Confidence

arXiv - AI · 4 min ·
[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models
Llms

[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

Abstract page for arXiv paper 2603.15970: 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight...

arXiv - AI · 4 min ·
[2603.09085] Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting
Llms

[2603.09085] Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting

Abstract page for arXiv paper 2603.09085: Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum ...

arXiv - AI · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime