[2603.25687] On Neural Scaling Laws for Weather Emulation through Continual Training
About this article
Abstract page for arXiv paper 2603.25687: On Neural Scaling Laws for Weather Emulation through Continual Training
Computer Science > Machine Learning arXiv:2603.25687 (cs) [Submitted on 26 Mar 2026] Title:On Neural Scaling Laws for Weather Emulation through Continual Training Authors:Shashank Subramanian, Alexander Kiefer, Arnur Nigmetov, Amir Gholami, Dmitriy Morozov, Michael W. Mahoney View a PDF of the paper titled On Neural Scaling Laws for Weather Emulation through Continual Training, by Shashank Subramanian and 5 other authors View PDF HTML (experimental) Abstract:Neural scaling laws, which in some domains can predict the performance of large neural networks as a function of model, data, and compute scale, are the cornerstone of building foundation models in Natural Language Processing and Computer Vision. We study neural scaling in Scientific Machine Learning, focusing on models for weather forecasting. To analyze scaling behavior in as simple a setting as possible, we adopt a minimal, scalable, general-purpose Swin Transformer architecture, and we use continual training with constant learning rates and periodic cooldowns as an efficient training strategy. We show that models trained in this minimalist way follow predictable scaling trends and even outperform standard cosine learning rate schedules. Cooldown phases can be re-purposed to improve downstream performance, e.g., enabling accurate multi-step rollouts over longer forecast horizons as well as sharper predictions through spectral loss adjustments. We also systematically explore a wide range of model and dataset sizes un...