[2603.25716] Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
About this article
Abstract page for arXiv paper 2603.25716: Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.25716 (cs) [Submitted on 26 Mar 2026] Title:Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Authors:Kaijin Chen, Dingkang Liang, Xin Zhou, Yikang Ding, Xiaoqiang Liu, Pengfei Wan, Xiang Bai View a PDF of the paper titled Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models, by Kaijin Chen and 5 other authors View PDF HTML (experimental) Abstract:Video world models have shown immense potential in simulating the physical world, yet existing memory mechanisms primarily treat environments as static canvases. When dynamic subjects hide out of sight and later re-emerge, current methods often struggle, leading to frozen, distorted, or vanishing subjects. To address this, we introduce Hybrid Memory, a novel paradigm requiring models to simultaneously act as precise archivists for static backgrounds and vigilant trackers for dynamic subjects, ensuring motion continuity during out-of-view intervals. To facilitate research in this direction, we construct HM-World, the first large-scale video dataset dedicated to hybrid memory. It features 59K high-fidelity clips with decoupled camera and subject trajectories, encompassing 17 diverse scenes, 49 distinct subjects, and meticulously designed exit-entry events to rigorously evaluate hybrid coherence. Furthermore, we propose HyDRA, a specialized memory architecture that compresses memory into tokens and utilizes a ...