[2511.21678] Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
About this article
Abstract page for arXiv paper 2511.21678: Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
Computer Science > Artificial Intelligence arXiv:2511.21678 (cs) [Submitted on 26 Nov 2025 (v1), last revised 2 May 2026 (this version, v2)] Title:Agentic Learner with Grow-and-Refine Multimodal Semantic Memory Authors:Weihao Bo, Shan Zhang, Yanpeng Sun, Jingjing Wu, Qunyi Xie, Xiao Tan, Kunbin Chen, Wei He, Xiaofan Li, Na Zhao, Jingdong Wang, Zechao Li View a PDF of the paper titled Agentic Learner with Grow-and-Refine Multimodal Semantic Memory, by Weihao Bo and 11 other authors View PDF HTML (experimental) Abstract:MLLMs exhibit strong reasoning on isolated queries, yet they operate de novo -- solving each problem independently and often repeating the same mistakes. Existing memory-augmented agents mainly store past trajectories for reuse. However, trajectory-based memory suffers from brevity bias, gradually losing essential domain knowledge. More critically, even in truly multimodal problem-solving settings, it records only a single-modality trace of past behavior, failing to preserve how visual attention and logical reasoning jointly contributed to the solution. This is fundamentally misaligned with human cognition: semantic memory is both multimodal and integrated, preserving visual and abstract knowledge through coordinated but distinct representational streams. We thus introduce ViLoMem, a dual-stream memory framework that constructs compact, schema-based memory. It separately encodes visual distraction patterns and logical reasoning errors, enabling MLLMs to learn...