[2603.04910] VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory
About this article
Abstract page for arXiv paper 2603.04910: VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory
Computer Science > Robotics arXiv:2603.04910 (cs) [Submitted on 5 Mar 2026] Title:VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory Authors:Yuheng Lei, Zhixuan Liang, Hongyuan Zhang, Ping Luo View a PDF of the paper titled VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory, by Yuheng Lei and 3 other authors View PDF HTML (experimental) Abstract:Imitation learning from human demonstrations has achieved significant success in robotic control, yet most visuomotor policies still condition on single-step observations or short-context histories, making them struggle with non-Markovian tasks that require long-term memory. Simply enlarging the context window incurs substantial computational and memory costs and encourages overfitting to spurious correlations, leading to catastrophic failures under distribution shift and violating real-time constraints in robotic systems. By contrast, humans can compress important past experiences into long-term memories and exploit them to solve tasks throughout their lifetime. In this paper, we propose VPWEM, a non-Markovian visuomotor policy equipped with working and episodic memories. VPWEM retains a sliding window of recent observation tokens as short-term working memory, and introduces a Transformer-based contextual memory compressor that recursively converts out-of-window observations into a fixed number of episodic memory tokens. The compressor uses self-attention over a cache of past summary token...