[2506.22447] Vision Transformers for Multi-Variable Climate Downscaling: Emulating Regional Climate Models with a Shared Encoder and Multi-Decoder Architecture
Summary
This article presents a novel multi-variable Vision Transformer architecture for climate downscaling, improving accuracy and efficiency over traditional models.
Why It Matters
The research addresses the limitations of existing climate models by introducing a more efficient method for downscaling multiple climate variables simultaneously. This has significant implications for regional climate studies, enhancing predictive capabilities while reducing computational costs.
Key Takeaways
- The proposed 1EMD architecture outperforms single-variable models in accuracy.
- It reduces computational costs by 29-32% compared to traditional methods.
- The model predicts six key climate variables simultaneously, enhancing contextual awareness.
Computer Science > Machine Learning arXiv:2506.22447 (cs) [Submitted on 12 Jun 2025 (v1), last revised 16 Feb 2026 (this version, v2)] Title:Vision Transformers for Multi-Variable Climate Downscaling: Emulating Regional Climate Models with a Shared Encoder and Multi-Decoder Architecture Authors:Fabio Merizzi, Harilaos Loukos View a PDF of the paper titled Vision Transformers for Multi-Variable Climate Downscaling: Emulating Regional Climate Models with a Shared Encoder and Multi-Decoder Architecture, by Fabio Merizzi and 1 other authors View PDF HTML (experimental) Abstract:Global Climate Models (GCMs) are critical for simulating large-scale climate dynamics, but their coarse spatial resolution limits their applicability in regional studies. Regional Climate Models (RCMs) address this limitation through dynamical downscaling, albeit at considerable computational cost and with limited flexibility. Deep learning has emerged as an efficient data-driven alternative; however, most existing approaches focus on single-variable models that downscale one variable at a time. This paradigm can lead to redundant computation, limited contextual awareness, and weak cross-variable this http URL address these limitations, we propose a multi-variable Vision Transformer (ViT) architecture with a shared encoder and variable-specific decoders (1EMD). The proposed model jointly predicts six key climate variables: surface temperature, wind speed, 500 hPa geopotential height, total precipitation...