Machine Learning Ai Safety Data Science

[2509.23592] Toward a Holistic Approach to Continual Model Merging

arXiv - Machine Learning February 23, 2026 4 min read Article

Summary

The paper presents a holistic framework for Continual Model Merging (CMM) that addresses scalability and performance issues in continual learning by optimizing merging processes across three critical stages.

Why It Matters

This research is significant as it tackles the challenges of catastrophic forgetting in machine learning models, providing a scalable solution that enhances performance without the need for historical data. This has implications for various applications in AI where continual learning is essential.

Key Takeaways

Introduces a three-stage framework for Continual Model Merging.
Addresses scalability issues associated with conventional continual learning methods.
Utilizes optimizer states to enhance merging without revisiting old data.
Demonstrates competitive performance on standard benchmarks.
Provides a solution to the catastrophic forgetting problem while maintaining memory constraints.

Computer Science > Machine Learning arXiv:2509.23592 (cs) [Submitted on 28 Sep 2025 (v1), last revised 20 Feb 2026 (this version, v2)] Title:Toward a Holistic Approach to Continual Model Merging Authors:Hoang Phan, Sungmin Cha, Tung Lam Tran, Qi Lei View a PDF of the paper titled Toward a Holistic Approach to Continual Model Merging, by Hoang Phan and 3 other authors View PDF HTML (experimental) Abstract:We present a holistic framework for Continual Model Merging (CMM) that intervenes at three critical stages: pre-merging, during merging, and post-merging-to address two fundamental challenges in continual learning. In particular, conventional approaches either maintain a growing list of per-domain task vectors, leading to scalability issues or rely solely on weight-space merging when old data is inaccessible, thereby losing crucial functional information. Our method overcomes these limitations by first fine-tuning the main model within its tangent space on domain-specific data; this linearization amplifies per-task weight disentanglement, effectively mitigating across-task interference. During merging, we leverage functional information from available optimizer states beyond mere parameter averages to avoid the need to revisit old data. Finally, a post-merging correction aligns the representation discrepancy between pre- and post-merged models, reducing bias and enhancing overall performance-all while operating under constant memory constraints without accessing historical...

Read Original Article