[2511.21678] Agentic Learner with Grow-and-Refine Multimodal Semantic

[2511.21678] Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

arXiv - AI May 05, 2026 4 min read

About this article

Abstract page for arXiv paper 2511.21678: Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

Computer Science > Artificial Intelligence arXiv:2511.21678 (cs) [Submitted on 26 Nov 2025 (v1), last revised 2 May 2026 (this version, v2)] Title:Agentic Learner with Grow-and-Refine Multimodal Semantic Memory Authors:Weihao Bo, Shan Zhang, Yanpeng Sun, Jingjing Wu, Qunyi Xie, Xiao Tan, Kunbin Chen, Wei He, Xiaofan Li, Na Zhao, Jingdong Wang, Zechao Li View a PDF of the paper titled Agentic Learner with Grow-and-Refine Multimodal Semantic Memory, by Weihao Bo and 11 other authors View PDF HTML (experimental) Abstract:MLLMs exhibit strong reasoning on isolated queries, yet they operate de novo -- solving each problem independently and often repeating the same mistakes. Existing memory-augmented agents mainly store past trajectories for reuse. However, trajectory-based memory suffers from brevity bias, gradually losing essential domain knowledge. More critically, even in truly multimodal problem-solving settings, it records only a single-modality trace of past behavior, failing to preserve how visual attention and logical reasoning jointly contributed to the solution. This is fundamentally misaligned with human cognition: semantic memory is both multimodal and integrated, preserving visual and abstract knowledge through coordinated but distinct representational streams. We thus introduce ViLoMem, a dual-stream memory framework that constructs compact, schema-based memory. It separately encodes visual distraction patterns and logical reasoning errors, enabling MLLMs to learn...

Originally published on May 05, 2026. Curated by AI News.

Llms

[2602.06869] Uncovering Cross-Objective Interference in Multi-Objective Alignment

Abstract page for arXiv paper 2602.06869: Uncovering Cross-Objective Interference in Multi-Objective Alignment

arXiv - Machine Learning · 3 min · 11 minutes ago

Llms

[2512.14954] Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation

Abstract page for arXiv paper 2512.14954: Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation

arXiv - Machine Learning · 4 min · 11 minutes ago

Llms

[2603.08022] Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

Abstract page for arXiv paper 2603.08022: Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

arXiv - Machine Learning · 4 min · 11 minutes ago

Llms

[2505.00753] LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey

Abstract page for arXiv paper 2505.00753: LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey

arXiv - Machine Learning · 4 min · 11 minutes ago

[2511.21678] Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

About this article

Related Articles

[2602.06869] Uncovering Cross-Objective Interference in Multi-Objective Alignment

[2512.14954] Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation

[2603.08022] Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

[2505.00753] LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey

No comments

Stay updated with AI News