[2511.21437] A Systematic Study of In-the-Wild Model Merging for Large

[2511.21437] A Systematic Study of In-the-Wild Model Merging for Large Language Models

arXiv - Machine Learning March 31, 2026 4 min read

About this article

Abstract page for arXiv paper 2511.21437: A Systematic Study of In-the-Wild Model Merging for Large Language Models

Computer Science > Computation and Language arXiv:2511.21437 (cs) [Submitted on 26 Nov 2025 (v1), last revised 29 Mar 2026 (this version, v2)] Title:A Systematic Study of In-the-Wild Model Merging for Large Language Models Authors:Oğuz Kağan Hitit, Leander Girrbach, Zeynep Akata View a PDF of the paper titled A Systematic Study of In-the-Wild Model Merging for Large Language Models, by O\u{g}uz Ka\u{g}an Hitit and 2 other authors View PDF HTML (experimental) Abstract:Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for settings where all merged experts have distinct roles and are tuned on clearly separated tasks also hold in settings where the merged experts do not have clearly distinct roles, but are trained on overlapping or even conflicting objectives. To evaluate this setting, we present a large-scale, systematic evaluation of "in-the-wild" model merging of heterogeneous experts, that may have been trained on overlapping or conflicting objectives. Concretely, we evaluate six state-of-the-art merging methods, including recent subspace methods, across four open-weight LLMs, twelve fine-tuned checkpoints per base model, and sixteen standard LLM benchmarks. Evaluating through standardized benchmarks, we measure both the probability that a model merged from a heterogeneous ...

Originally published on March 31, 2026. Curated by AI News.

Llms

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing

Anthropic's AuditBench - 56 Llama 3.3 70B models with planted hidden behaviors - their best agent detects the behaviros 10-13% of the tim...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an after...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

I have been coding for 11 years and I caught myself completely unable to debug a problem without AI assistance last month. That scared me more than anything I have seen in this industry.

I want to be honest about something that happened to me because I think it is more common than people admit. Last month I hit a bug in a ...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

OpenClaw security checklist: practical safeguards for AI agents

Here is one of the better quality guides on the ensuring safety when deploying OpenClaw: https://chatgptguide.ai/openclaw-security-checkl...

Reddit - Artificial Intelligence · 1 min · about 9 hours ago

[2511.21437] A Systematic Study of In-the-Wild Model Merging for Large Language Models

About this article

Related Articles

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

I have been coding for 11 years and I caught myself completely unable to debug a problem without AI assistance last month. That scared me more than anything I have seen in this industry.

OpenClaw security checklist: practical safeguards for AI agents

No comments

Stay updated with AI News