[2602.21351] A Hierarchical Multi-Agent System for Autonomous Discovery in Geoscientific Data Archives
Summary
The paper presents PANGAEA-GPT, a hierarchical multi-agent system designed to enhance autonomous data discovery in geoscientific archives, addressing the challenge of underutilized datasets.
Why It Matters
As Earth science data continues to grow, effective methods for data discovery and analysis are crucial for maximizing the utility of existing datasets. This research introduces a novel framework that improves data accessibility and usability, potentially transforming how researchers interact with geoscientific data.
Key Takeaways
- PANGAEA-GPT uses a Supervisor-Worker topology for efficient data processing.
- The system incorporates data-type-aware routing and sandboxed execution for enhanced reliability.
- It demonstrates the ability to perform complex workflows with minimal human input across various scientific domains.
Computer Science > Artificial Intelligence arXiv:2602.21351 (cs) [Submitted on 24 Feb 2026] Title:A Hierarchical Multi-Agent System for Autonomous Discovery in Geoscientific Data Archives Authors:Dmitrii Pantiukhin, Ivan Kuznetsov, Boris Shapkin, Antonia Anna Jost, Thomas Jung, Nikolay Koldunov View a PDF of the paper titled A Hierarchical Multi-Agent System for Autonomous Discovery in Geoscientific Data Archives, by Dmitrii Pantiukhin and 5 other authors View PDF HTML (experimental) Abstract:The rapid accumulation of Earth science data has created a significant scalability challenge; while repositories like PANGAEA host vast collections of datasets, citation metrics indicate that a substantial portion remains underutilized, limiting data reusability. Here we present PANGAEA-GPT, a hierarchical multi-agent framework designed for autonomous data discovery and analysis. Unlike standard Large Language Model (LLM) wrappers, our architecture implements a centralized Supervisor-Worker topology with strict data-type-aware routing, sandboxed deterministic code execution, and self-correction via execution feedback, enabling agents to diagnose and resolve runtime errors. Through use-case scenarios spanning physical oceanography and ecology, we demonstrate the system's capacity to execute complex, multi-step workflows with minimal human intervention. This framework provides a methodology for querying and analyzing heterogeneous repository data through coordinated agent workflows. Com...