Llms Machine Learning Robotics Ai Agents Nlp Generative Ai

[2602.15513] Improving MLLMs in Embodied Exploration and Question Answering with Human-Inspired Memory Modeling

arXiv - AI February 18, 2026 3 min read Article

Summary

This paper presents a novel non-parametric memory framework for improving Multimodal Large Language Models (MLLMs) in embodied exploration and question answering, enhancing performance through human-inspired memory modeling.

Why It Matters

The research addresses significant challenges in deploying MLLMs for embodied agents, particularly in dynamic environments. By improving memory modeling, it enhances the efficiency and reasoning capabilities of AI systems, which is crucial for advancing robotics and AI applications.

Key Takeaways

Introduces a non-parametric memory framework that separates episodic and semantic memory.
Demonstrates state-of-the-art performance improvements in embodied question answering benchmarks.
Highlights the importance of episodic memory for exploration efficiency and semantic memory for complex reasoning.
Offers a retrieval-first, reasoning-assisted approach that enhances memory reuse.
Provides insights into cross-environment generalization capabilities of embodied agents.

Computer Science > Robotics arXiv:2602.15513 (cs) [Submitted on 17 Feb 2026] Title:Improving MLLMs in Embodied Exploration and Question Answering with Human-Inspired Memory Modeling Authors:Ji Li, Jing Xia, Mingyi Li, Shiyan Hu View a PDF of the paper titled Improving MLLMs in Embodied Exploration and Question Answering with Human-Inspired Memory Modeling, by Ji Li and 3 other authors View PDF HTML (experimental) Abstract:Deploying Multimodal Large Language Models as the brain of embodied agents remains challenging, particularly under long-horizon observations and limited context budgets. Existing memory assisted methods often rely on textual summaries, which discard rich visual and spatial details and remain brittle in non-stationary environments. In this work, we propose a non-parametric memory framework that explicitly disentangles episodic and semantic memory for embodied exploration and question answering. Our retrieval-first, reasoning-assisted paradigm recalls episodic experiences via semantic similarity and verifies them through visual reasoning, enabling robust reuse of past observations without rigid geometric alignment. In parallel, we introduce a program-style rule extraction mechanism that converts experiences into structured, reusable semantic memory, facilitating cross-environment generalization. Extensive experiments demonstrate state-of-the-art performance on embodied question answering and exploration benchmarks, yielding a 7.3% gain in LLM-Match and an 1...

Read Original Article

[2602.15513] Improving MLLMs in Embodied Exploration and Question Answering with Human-Inspired Memory Modeling

Summary

Why It Matters

Key Takeaways

Related Articles

Have Companies Began Adopting Claude Co-Work at an Enterprise Level?

What I learned about multi-agent coordination running 9 specialized Claude agents

[D] The problem with comparing AI memory system benchmarks — different evaluation methods make scores meaningless

Shifting to AI model customization is an architectural imperative | MIT Technology Review

No comments

Stay updated with AI News