[2505.23522] OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data
Summary
The article introduces OmniEarth-Bench, a comprehensive benchmark for evaluating interactions across Earth's six spheres using multimodal observational data, highlighting significant gaps in current machine learning models' performance.
Why It Matters
This research addresses the limitations of existing benchmarks in Earth science by providing a holistic framework that encompasses all six spheres of the Earth system. It is crucial for advancing the understanding of complex interactions in Earth sciences and improving machine learning models' capabilities in this domain.
Key Takeaways
- OmniEarth-Bench is the first benchmark covering all six Earth spheres.
- The benchmark includes 29,855 expert-curated annotations across 109 tasks.
- Current state-of-the-art models struggle to achieve 35% accuracy on these benchmarks.
- The framework supports scalable and modular data inference for better evaluation.
- This research highlights critical gaps in Earth-system cognitive abilities of existing models.
Computer Science > Computer Vision and Pattern Recognition arXiv:2505.23522 (cs) [Submitted on 29 May 2025 (v1), last revised 15 Feb 2026 (this version, v3)] Title:OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data Authors:Fengxiang Wang, Mingshuo Chen, Xuming He, Yi-Fan Zhang, Yueying Li, Feng Liu, Zijie Guo, Zhenghao Hu, Jiong Wang, Jingyi Xu, Zhangrui Li, Junchao Gong, Di Wang, Fenghua Ling, Ben Fei, Weijia Li, Long Lan, Wenjing Yang View a PDF of the paper titled OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data, by Fengxiang Wang and 17 other authors View PDF HTML (experimental) Abstract:Existing benchmarks for multimodal learning in Earth science offer limited, siloed coverage of Earth's spheres and their cross-sphere interactions, typically restricting evaluation to the human-activity sphere of atmosphere and to at most 16 tasks. These limitations: narrow-source heterogeneity (single/few data sources), constrained scientific granularity, and limited-sphere extensibility. Therefore, we introduce OmniEarth-Bench, the first multimodal benchmark that systematically spans all six spheres: atmosphere, lithosphere, oceanosphere, cryosphere, biosphere, and human-activity sphere, and cross-spheres. Built with a scalable, modular-topology data inference framework and native multi-observation sources ...