[2603.26786] A Step Toward Federated Pretraining of Multimodal Large Language Models

[2603.26786] A Step Toward Federated Pretraining of Multimodal Large Language Models

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2603.26786: A Step Toward Federated Pretraining of Multimodal Large Language Models

Computer Science > Machine Learning arXiv:2603.26786 (cs) [Submitted on 25 Mar 2026] Title:A Step Toward Federated Pretraining of Multimodal Large Language Models Authors:Baochen Xiong, Yifan Xu, Xiaoshan Yang, Yaguang Song, Yaowei Wang, Changsheng Xu View a PDF of the paper titled A Step Toward Federated Pretraining of Multimodal Large Language Models, by Baochen Xiong and 5 other authors View PDF HTML (experimental) Abstract:The rapid evolution of Multimodal Large Language Models (MLLMs) is bottlenecked by the saturation of high-quality public data, while vast amounts of diverse multimodal data remain inaccessible in privacy-sensitive silos. Federated Learning (FL) offers a promising solution to unlock these distributed resources, but existing research focuses predominantly on fine-tuning, leaving the foundational pre-training phase largely unexplored. In this paper, we formally introduce the Federated MLLM Alignment (Fed-MA) task, a lightweight pre-training paradigm that freezes the vision encoder and LLM while collaboratively training the cross-modal projector. We identify two critical challenges in this setting: (i) parameter interference in aggregating local projectors; and (ii) gradient oscillations in one-pass collaborative SGD. To address these challenges, we propose Fed-CMP, a pioneering framework for federated MLLM pre-training. Fed-CMP employs Canonical Reliability-Aware Aggregation, which constructs a canonical space to decompose client projectors into a share...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

Llms

My AI spent last night modifying its own codebase

I've been working on a local AI system called Apis that runs completely offline through Ollama. During a background run, Apis identified ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Fake users generated by AI can't simulate humans — review of 182 research papers. Your thoughts?

https://www.researchsquare.com/article/rs-9057643/v1 There’s a massive trend right now where tech companies, businesses, even researchers...

Reddit - Artificial Intelligence · 1 min ·
Llms

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

TL;DR: Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — a...

Reddit - Artificial Intelligence · 1 min ·
[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage
Llms

[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

Abstract page for arXiv paper 2603.23966: Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime