[2601.16296] Memory-V2V: Memory-Augmented Video-to-Video Diffusion for

[2601.16296] Memory-V2V: Memory-Augmented Video-to-Video Diffusion for Consistent Multi-Turn Editing

arXiv - Machine Learning March 24, 2026 3 min read

About this article

Abstract page for arXiv paper 2601.16296: Memory-V2V: Memory-Augmented Video-to-Video Diffusion for Consistent Multi-Turn Editing

Computer Science > Computer Vision and Pattern Recognition arXiv:2601.16296 (cs) [Submitted on 22 Jan 2026 (v1), last revised 23 Mar 2026 (this version, v2)] Title:Memory-V2V: Memory-Augmented Video-to-Video Diffusion for Consistent Multi-Turn Editing Authors:Dohun Lee, Chun-Hao Paul Huang, Xuelin Chen, Jong Chul Ye, Duygu Ceylan, Hyeonho Jeong View a PDF of the paper titled Memory-V2V: Memory-Augmented Video-to-Video Diffusion for Consistent Multi-Turn Editing, by Dohun Lee and 5 other authors View PDF HTML (experimental) Abstract:Video-to-video diffusion models achieve impressive single-turn editing performance, but practical editing workflows are inherently iterative. When edits are applied sequentially, existing models treat each turn independently, often causing previously generated regions to drift or be overwritten. We identify this failure mode as the problem of cross-turn consistency in multi-turn video editing. We introduce Memory-V2V, a memory-augmented framework that treats prior edits as structured constraints for subsequent generations. Memory-V2V maintains an external memory of previous outputs, retrieves task-relevant edits, and integrates them through relevance-aware tokenization and adaptive compression. These technical ingredients enable scalable conditioning without linear growth in computation. We demonstrate Memory-V2V on iterative video novel view synthesis and text-guided long video editing. Memory-V2V substantially enhances cross-turn consistency w...

Originally published on March 24, 2026. Curated by AI News.

Machine Learning

[D] It’s 2026. Can we finally admit TensorFlow is the "COBOL of Machine Learning"?

We keep telling students to learn both, but let’s look at the actual landscape: Research: 95%+ of HuggingFace and arXiv is PyTorch. Innov...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

I have question for people who got job

how you guys getting job in ml as a fresher ?? I am in college. havent started learning ml but willing to . let me know exactly how to do...

Reddit - ML Jobs · 1 min · about 5 hours ago

Llms

🤖 AI News Digest - March 27, 2026

Today's AI news: 1. My minute-by-minute response to the LiteLLM malware attack The article describes a detailed, minute-by-minute respons...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

Llms

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

I have a problem statement where we are supposed to detect the attention level of student in a classroom, basically output whether he is ...

Reddit - Machine Learning · 1 min · about 6 hours ago

[2601.16296] Memory-V2V: Memory-Augmented Video-to-Video Diffusion for Consistent Multi-Turn Editing

About this article

Related Articles

[D] It’s 2026. Can we finally admit TensorFlow is the "COBOL of Machine Learning"?

I have question for people who got job

🤖 AI News Digest - March 27, 2026

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

No comments

Stay updated with AI News