[2602.05495] Transport and Merge: Cross-Architecture Merging for Large Language Models
Summary
This paper presents a novel framework for cross-architecture merging of large language models (LLMs), enabling knowledge transfer from high-resource to low-resource models through optimal transport methods.
Why It Matters
As LLMs become increasingly prevalent, the ability to effectively transfer knowledge to smaller, low-resource models is crucial for broadening access and application in diverse domains. This research addresses a significant gap in existing methodologies by allowing heterogeneous model merging, which can enhance performance in under-resourced languages and specialized applications.
Key Takeaways
- Introduces a cross-architecture merging framework using optimal transport.
- Facilitates effective knowledge transfer from large models to smaller, low-resource counterparts.
- Demonstrates consistent performance improvements across various low-resource languages and domains.
Computer Science > Computation and Language arXiv:2602.05495 (cs) [Submitted on 5 Feb 2026 (v1), last revised 22 Feb 2026 (this version, v2)] Title:Transport and Merge: Cross-Architecture Merging for Large Language Models Authors:Chenhang Cui, Binyun Yang, Fei Shen, Yuxin Chen, Jingnan Zheng, Xiang Wang, An Zhang, Tat-Seng Chua View a PDF of the paper titled Transport and Merge: Cross-Architecture Merging for Large Language Models, by Chenhang Cui and 7 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) achieve strong capabilities by scaling model capacity and training data, yet many real-world deployments rely on smaller models trained or adapted from low-resource data. This gap motivates the need for mechanisms to transfer knowledge from large, high-resource models to smaller, low-resource targets. While model merging provides an effective transfer mechanism, most existing approaches assume architecture-compatible models and therefore cannot directly transfer knowledge from large high-resource LLMs to heterogeneous low-resource targets. In this work, we propose a cross-architecture merging framework based on optimal transport (OT) that aligns activations to infer cross-neuron correspondences between heterogeneous models. The resulting transport plans are then used to guide direct weight-space fusion, enabling effective high-resource to low-resource transfer using only a small set of inputs. Extensive experiments across low-resource languages...