[2505.19892] OptMerge: Unifying Multimodal LLM Capabilities and

[2505.19892] OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging

arXiv - AI March 04, 2026 4 min read

About this article

Abstract page for arXiv paper 2505.19892: OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging

Computer Science > Artificial Intelligence arXiv:2505.19892 (cs) [Submitted on 26 May 2025 (v1), last revised 3 Mar 2026 (this version, v3)] Title:OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging Authors:Yongxian Wei, Runxi Cheng, Weike Jin, Enneng Yang, Li Shen, Lu Hou, Sinan Du, Chun Yuan, Xiaochun Cao, Dacheng Tao View a PDF of the paper titled OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging, by Yongxian Wei and 9 other authors View PDF HTML (experimental) Abstract:Foundation models update slowly due to resource-intensive training, whereas domain-specific models evolve rapidly between releases. Model merging seeks to combine multiple expert models into a single, more capable model, reducing storage and serving costs while supporting decentralized development. Despite its potential, previous studies have primarily focused on merging visual classification models or Large Language Models (LLMs) for code and math tasks. Recently, Multimodal LLMs (MLLMs) that extend LLMs through large-scale multimodal training have gained traction. However, there lacks a benchmark for model merging research that clearly divides the tasks for MLLM training and evaluation. In this paper, $\textbf{(i)}$ we introduce a model merging benchmark for MLLMs, which includes multiple tasks such as VQA, Geometry, Chart, OCR, and Grounding, studying both LoRA and full fine-tuning models. Moreover, we explore how model merging can combine diff...

Originally published on March 04, 2026. Curated by AI News.

Llms

[R] Depth-first pruning transfers: GPT-2 → TinyLlama with stable gains and minimal loss

TL;DR: Removing the right layers (instead of shrinking all layers) makes transformer models ~8–12% smaller with only ~6–8% quality loss, ...

Reddit - Machine Learning · 1 min · 21 minutes ago

Llms

Built a training stability monitor that detects instability before your loss curve shows anything — open sourced the core today

Been working on a weight divergence trajectory curvature approach to detecting neural network training instability. Treats weight updates...

Reddit - Artificial Intelligence · 1 min · 21 minutes ago

Llms

This Is Not Hacking. This Is Structured Intelligence.

Watch me demonstrate everything I've been talking about—live, in real time. The Setup: Maestro University AI enrollment system Standard c...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

[D] Howcome Muon is only being used for Transformers?

Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets tu...

Reddit - Machine Learning · 1 min · about 3 hours ago

[2505.19892] OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging

About this article

Related Articles

[R] Depth-first pruning transfers: GPT-2 → TinyLlama with stable gains and minimal loss

Built a training stability monitor that detects instability before your loss curve shows anything — open sourced the core today

This Is Not Hacking. This Is Structured Intelligence.

[D] Howcome Muon is only being used for Transformers?

No comments

Stay updated with AI News