[2603.01416] Securing the Floor and Raising the Ceiling: A

[2603.01416] Securing the Floor and Raising the Ceiling: A Merging-based Paradigm for Multi-modal Search Agents

arXiv - AI March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.01416: Securing the Floor and Raising the Ceiling: A Merging-based Paradigm for Multi-modal Search Agents

Computer Science > Artificial Intelligence arXiv:2603.01416 (cs) [Submitted on 2 Mar 2026] Title:Securing the Floor and Raising the Ceiling: A Merging-based Paradigm for Multi-modal Search Agents Authors:Zhixiang Wang, Jingxuan Xu, Dajun Chen, Yunfang Wu, Wei Jiang, Yong Li View a PDF of the paper titled Securing the Floor and Raising the Ceiling: A Merging-based Paradigm for Multi-modal Search Agents, by Zhixiang Wang and 5 other authors View PDF HTML (experimental) Abstract:Recent advances in Vision-Language Models (VLMs) have motivated the development of multi-modal search agents that can actively invoke external search tools and integrate retrieved evidence through multi-step reasoning. While promising, existing approaches typically rely on large-scale supervised trajectories or expensive reinforcement learning (RL), leading to high training cost, instability, and a severe cold-start problem for standard VLMs. We propose a training-free paradigm to empower VLMs with autonomous search capabilities via cross-modal model merging. By fusing a text-based search agent with a base VLM, we show that multi-modal search capabilities can be effectively composed without any additional multi-modal training data. To mitigate parameter interference during cross-modal integration, we introduce Optimal Brain Merging (OBM), a saliency-aware merging algorithm that identifies task-critical parameters based on their impact on model loss using only a small set of calibration samples. Extens...

Originally published on March 03, 2026. Curated by AI News.

Llms

[2602.07238] Is there "Secret Sauce'' in Large Language Model Development?

Abstract page for arXiv paper 2602.07238: Is there "Secret Sauce'' in Large Language Model Development?

arXiv - Machine Learning · 3 min · about 4 hours ago

Llms

[2602.01203] Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

Abstract page for arXiv paper 2602.01203: Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

arXiv - Machine Learning · 4 min · about 4 hours ago

Llms

[2601.01322] LinMU: Multimodal Understanding Made Linear

Abstract page for arXiv paper 2601.01322: LinMU: Multimodal Understanding Made Linear

arXiv - Machine Learning · 4 min · about 4 hours ago

Llms

[2512.05525] Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

Abstract page for arXiv paper 2512.05525: Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

arXiv - Machine Learning · 4 min · about 4 hours ago

[2603.01416] Securing the Floor and Raising the Ceiling: A Merging-based Paradigm for Multi-modal Search Agents

About this article

Related Articles

[2602.07238] Is there "Secret Sauce'' in Large Language Model Development?

[2602.01203] Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

[2601.01322] LinMU: Multimodal Understanding Made Linear

[2512.05525] Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

No comments

Stay updated with AI News