[2508.21421] Rethinking Layer-wise Model Merging through Chain of Merges

[2508.21421] Rethinking Layer-wise Model Merging through Chain of Merges

arXiv - Machine Learning 4 min read Article

Summary

This article presents a novel approach to merging pretrained models called Chain of Merges (CoM), which addresses the limitations of existing layer-wise merging techniques by considering inter-layer dependencies to improve model performance.

Why It Matters

As the number of specialized models increases, the ability to merge them effectively without retraining is crucial for efficiency in machine learning. This research proposes a method that mitigates distributional mismatches, enhancing the performance of merged models and contributing to the advancement of model optimization techniques.

Key Takeaways

  • Current merging techniques overlook inter-layer dependencies, leading to performance issues.
  • The proposed Chain of Merges (CoM) method updates activation statistics to address covariate shift.
  • CoM achieves state-of-the-art performance on standard benchmarks, showcasing its effectiveness.
  • Understanding internal covariate shift is key to improving model merging techniques.
  • This research contributes to the broader field of model optimization in machine learning.

Computer Science > Machine Learning arXiv:2508.21421 (cs) [Submitted on 29 Aug 2025 (v1), last revised 25 Feb 2026 (this version, v3)] Title:Rethinking Layer-wise Model Merging through Chain of Merges Authors:Pietro Buzzega, Riccardo Salami, Angelo Porrello, Simone Calderara View a PDF of the paper titled Rethinking Layer-wise Model Merging through Chain of Merges, by Pietro Buzzega and 2 other authors View PDF HTML (experimental) Abstract:Fine-tuning pretrained models has become a standard pathway to achieve state-of-the-art performance across a wide range of domains, leading to a proliferation of task-specific model variants. As the number of such specialized models increases, merging them into a unified model without retraining has become a critical challenge. Existing merging techniques operate at the level of individual layers, thereby overlooking the inter-layer dependencies inherent in deep networks. We show that this simplification leads to distributional mismatches, particularly in methods that rely on intermediate activations, as changes in early layers are not properly propagated to downstream layers during merging. We identify these mismatches as a form of internal covariate shift, comparable to the phenomenon encountered in the initial phases of neural networks training. To address this, we propose Chain of Merges (CoM), a layer-wise merging procedure that sequentially merges weights across layers while sequentially updating activation statistics. By explicitl...

Related Articles

Machine Learning

Auto agent - Self improving domain expertise agent

someone opensource an ai agent that autonomously upgraded itself to #1 across multiple domains in < 24 hours…. then open sourced the e...

Reddit - Artificial Intelligence · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Tuskegee University to host the 2026 Amazon Web Services–Machine Learning University Research & Teaching Symposium
Machine Learning

Tuskegee University to host the 2026 Amazon Web Services–Machine Learning University Research & Teaching Symposium

Tuskegee University will host the 2026 Amazon Web Services–Machine Learning University Spring AI/ML Teaching & Research Symposium on Febr...

AI News - General · 8 min ·
A Machine Learning Engineer Thought He Was Safe From AI Layoffs. Then He Got Some Depressing News
Machine Learning

A Machine Learning Engineer Thought He Was Safe From AI Layoffs. Then He Got Some Depressing News

One former Block machine learning engineer told Business Insider that he was blindsided by the latest layoff.

AI News - General · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime