[2604.12890] Towards Long-horizon Agentic Multimodal Search

arXiv - AI April 15, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.12890: Towards Long-horizon Agentic Multimodal Search

Computer Science > Computer Vision and Pattern Recognition arXiv:2604.12890 (cs) [Submitted on 14 Apr 2026] Title:Towards Long-horizon Agentic Multimodal Search Authors:Yifan Du, Zikang Liu, Jinbiao Peng, Jie Wu, Junyi Li, Jinyang Li, Wayne Xin Zhao, Ji-Rong Wen View a PDF of the paper titled Towards Long-horizon Agentic Multimodal Search, by Yifan Du and 7 other authors View PDF HTML (experimental) Abstract:Multimodal deep search agents have shown great potential in solving complex tasks by iteratively collecting textual and visual evidence. However, managing the heterogeneous information and high token costs associated with multimodal inputs over long horizons remains a critical challenge, as existing methods often suffer from context explosion or the loss of crucial visual signals. To address this, we propose a novel Long-horizon MultiModal deep search framework, named LMM-Searcher, centered on a file-based visual representation mechanism. By offloading visual assets to an external file system and mapping them to lightweight textual identifiers (UIDs), our approach mitigates context overhead while preserving multimodal information for future access. We equip the agent with a tailored fetch-image tool, enabling a progressive, on-demand visual loading strategy for active perception. Furthermore, we introduce a data synthesis pipeline designed to generate queries requiring complex cross-modal multi-hop reasoning. Using this pipeline, we distill 12K high-quality trajectorie...

Originally published on April 15, 2026. Curated by AI News.

Machine Learning

Week 6 AIPass update - answering the top questions from last post (file conflicts, remote models, scale)

Followup to last post with answers to the top questions from the comments. Appreciate everyone who jumped in. The most common one by a mi...

Reddit - Artificial Intelligence · 1 min · less than a minute ago

Ai Agents

India's vibe-coding startup Emergent enters OpenClaw-like AI agent space | TechCrunch

Emergent's Wingman lets users manage and automate tasks through chat on platforms like WhatsApp and Telegram.

TechCrunch - AI · 4 min · about 1 hour ago

Ai Agents

I tracked what AI agents actually do when nobody's watching. Built a tool that replays every decision.

Been building AI agents for about a year now and the thing that always drove me crazy is you deploy an agent, it runs for hours, and you ...

Reddit - Artificial Intelligence · 1 min · about 7 hours ago

Open Source Ai

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

A Blog post by IBM Research on Hugging Face

Hugging Face Blog · 15 min · about 7 hours ago

[2604.12890] Towards Long-horizon Agentic Multimodal Search

About this article

Related Articles

Week 6 AIPass update - answering the top questions from last post (file conflicts, remote models, scale)

India's vibe-coding startup Emergent enters OpenClaw-like AI agent space | TechCrunch

I tracked what AI agents actually do when nobody's watching. Built a tool that replays every decision.

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

No comments

Stay updated with AI News