Generative Ai Ai Safety Computer Vision Ai Agents

[2602.14941] AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

arXiv - AI February 17, 2026 4 min read Article

Summary

AnchorWeave introduces a novel framework for video generation that enhances spatial consistency over long durations by utilizing multiple local geometric memories.

Why It Matters

This research addresses a significant challenge in video generation, where maintaining spatial consistency is crucial for realistic outputs. By improving the quality and coherence of generated videos, this work has implications for various applications in computer vision and AI, including virtual reality and autonomous systems.

Key Takeaways

AnchorWeave replaces misaligned global memories with multiple clean local geometric memories.
The framework employs a coverage-driven local memory retrieval aligned with the target trajectory.
Integration of selected local memories is managed through a multi-anchor weaving controller.
Extensive experiments show significant improvements in long-term scene consistency and visual quality.
Ablation studies validate the effectiveness of local geometric conditioning and multi-anchor control.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.14941 (cs) [Submitted on 16 Feb 2026] Title:AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories Authors:Zun Wang, Han Lin, Jaehong Yoon, Jaemin Cho, Yue Zhang, Mohit Bansal View a PDF of the paper titled AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories, by Zun Wang and 5 other authors View PDF HTML (experimental) Abstract:Maintaining spatial world consistency over long horizons remains a central challenge for camera-controllable video generation. Existing memory-based approaches often condition generation on globally reconstructed 3D scenes by rendering anchor videos from the reconstructed geometry in the history. However, reconstructing a global 3D scene from multiple views inevitably introduces cross-view misalignment, as pose and depth estimation errors cause the same surfaces to be reconstructed at slightly different 3D locations across views. When fused, these inconsistencies accumulate into noisy geometry that contaminates the conditioning signals and degrades generation quality. We introduce AnchorWeave, a memory-augmented video generation framework that replaces a single misaligned global memory with multiple clean local geometric memories and learns to reconcile their cross-view inconsistencies. To this end, AnchorWeave performs coverage-driven local memory retrieval aligned with the target trajectory and integrates the selecte...

Read Original Article

[2602.14941] AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

Summary

Why It Matters

Key Takeaways

Related Articles

Accelerating science with AI and simulations

[2603.12057] Coarse-Guided Visual Generation via Weighted h-Transform Sampling

[2603.07455] Image Generation Models: A Technical History

[2512.22065] StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars

No comments

Stay updated with AI News