[2602.05495] Transport and Merge: Cross-Architecture Merging for Large Language Models

[2602.05495] Transport and Merge: Cross-Architecture Merging for Large Language Models

arXiv - AI 3 min read Article

Summary

This paper presents a novel framework for cross-architecture merging of large language models (LLMs), enabling knowledge transfer from high-resource to low-resource models through optimal transport methods.

Why It Matters

As LLMs become increasingly prevalent, the ability to effectively transfer knowledge to smaller, low-resource models is crucial for broadening access and application in diverse domains. This research addresses a significant gap in existing methodologies by allowing heterogeneous model merging, which can enhance performance in under-resourced languages and specialized applications.

Key Takeaways

  • Introduces a cross-architecture merging framework using optimal transport.
  • Facilitates effective knowledge transfer from large models to smaller, low-resource counterparts.
  • Demonstrates consistent performance improvements across various low-resource languages and domains.

Computer Science > Computation and Language arXiv:2602.05495 (cs) [Submitted on 5 Feb 2026 (v1), last revised 22 Feb 2026 (this version, v2)] Title:Transport and Merge: Cross-Architecture Merging for Large Language Models Authors:Chenhang Cui, Binyun Yang, Fei Shen, Yuxin Chen, Jingnan Zheng, Xiang Wang, An Zhang, Tat-Seng Chua View a PDF of the paper titled Transport and Merge: Cross-Architecture Merging for Large Language Models, by Chenhang Cui and 7 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) achieve strong capabilities by scaling model capacity and training data, yet many real-world deployments rely on smaller models trained or adapted from low-resource data. This gap motivates the need for mechanisms to transfer knowledge from large, high-resource models to smaller, low-resource targets. While model merging provides an effective transfer mechanism, most existing approaches assume architecture-compatible models and therefore cannot directly transfer knowledge from large high-resource LLMs to heterogeneous low-resource targets. In this work, we propose a cross-architecture merging framework based on optimal transport (OT) that aligns activations to infer cross-neuron correspondences between heterogeneous models. The resulting transport plans are then used to guide direct weight-space fusion, enabling effective high-resource to low-resource transfer using only a small set of inputs. Extensive experiments across low-resource languages...

Related Articles

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto
Llms

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto

AI Tools & Products · 7 min ·
Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains
Llms

Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains

AI Tools & Products · 5 min ·
AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface
Llms

AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface

AI Tools & Products · 3 min ·
Llms

Claude, OpenClaw and the new reality: AI agents are here — and so is the chaos

AI Tools & Products ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime