[2603.19544] Scalable Cross-Facility Federated Learning for Scientific Foundation Models on Multiple Supercomputers
About this article
Abstract page for arXiv paper 2603.19544: Scalable Cross-Facility Federated Learning for Scientific Foundation Models on Multiple Supercomputers
Computer Science > Machine Learning arXiv:2603.19544 (cs) [Submitted on 20 Mar 2026] Title:Scalable Cross-Facility Federated Learning for Scientific Foundation Models on Multiple Supercomputers Authors:Yijiang Li, Zilinghan Li, Kyle Chard, Ian Foster, Todd Munson, Ravi Madduri, Kibaek Kim View a PDF of the paper titled Scalable Cross-Facility Federated Learning for Scientific Foundation Models on Multiple Supercomputers, by Yijiang Li and 6 other authors View PDF HTML (experimental) Abstract:Artificial Intelligence for scientific applications increasingly requires training large models on data that cannot be centralized due to privacy constraints, data sovereignty, or the sheer volume of data generated. Federated learning (FL) addresses this by enabling collaborative training without centralizing raw data, but scientific applications demand model scales that requires extensive computing resources, typically offered at High Performance Computing (HPC) facilities. Deploying FL experiments across HPC facilities introduces challenges beyond cloud or enterprise settings. We present a comprehensive cross-facility FL framework for heterogeneous HPC environments, built on Advanced Privacy-Preserving Federated Learning (APPFL) framework with Globus Compute and Transfer orchestration, and evaluate it across four U.S. Department of Energy (DOE) leadership-class supercomputers. We demonstrate that FL experiments across HPC facilities are practically achievable, characterize key source...