[2602.20208] Model Merging in the Essential Subspace
Summary
This paper presents ESM, a novel framework for merging multiple task-specific models into a single multi-task model, addressing inter-task interference and enhancing performance through essential subspace analysis.
Why It Matters
Model merging is crucial in machine learning as it allows for the integration of specialized models without retraining, thus saving resources and time. This research introduces a robust method to overcome common challenges in merging, potentially leading to more efficient AI systems.
Key Takeaways
- Introduces ESM, a framework for effective model merging.
- Utilizes PCA to identify essential subspaces for merging.
- Mitigates inter-task interference while preserving task-specific functionality.
- Employs a polarized scaling strategy to enhance critical knowledge.
- Demonstrates state-of-the-art performance across multiple tasks.
Computer Science > Machine Learning arXiv:2602.20208 (cs) [Submitted on 23 Feb 2026] Title:Model Merging in the Essential Subspace Authors:Longhua Li, Lei Qi, Qi Tian, Xin Geng View a PDF of the paper titled Model Merging in the Essential Subspace, by Longhua Li and 3 other authors View PDF HTML (experimental) Abstract:Model merging aims to integrate multiple task-specific fine-tuned models derived from a shared pre-trained checkpoint into a single multi-task model without additional training. Despite extensive research, task interference remains a major obstacle that often undermines the performance of merged models. In this paper, we propose ESM (Essential Subspace Merging) , a robust framework for effective model merging. We begin by performing Principal Component Analysis (PCA) on feature shifts induced by parameter updates. The resulting principal directions span an essential subspace that dominantly influences feature representations. Each task's parameter update matrix is projected onto its respective essential subspace for low-rank decomposition before merging. This methodology mitigates inter-task interference while preserving core task-specific functionality. Furthermore, we introduce a multi-level polarized scaling strategy that amplifies parameters containing critical knowledge and suppresses redundant ones, preventing essential knowledge from being overwhelmed during fusion. Extensive experiments across multiple task sets and model scales demonstrate that our ...