[2510.23587] A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
Summary
This survey explores the concept of data agents, autonomous systems that manage complex data tasks. It introduces a hierarchical taxonomy to clarify the evolving capabilities and responsibilities of these agents, addressing current ambiguities in the field.
Why It Matters
As large language models advance, understanding data agents becomes crucial for their effective deployment in AI ecosystems. This survey provides a structured framework that can guide researchers and practitioners in navigating the complexities of data agent technology, fostering clearer expectations and accountability.
Key Takeaways
- Introduces a systematic taxonomy for data agents, clarifying their capabilities.
- Addresses terminological ambiguities that hinder industry growth and user understanding.
- Highlights the transition from manual operations to fully autonomous data agents.
- Reviews existing research on data agents, focusing on increasing autonomy.
- Offers a roadmap for future developments in proactive, generative data agents.
Computer Science > Databases arXiv:2510.23587 (cs) [Submitted on 27 Oct 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:A Survey of Data Agents: Emerging Paradigm or Overstated Hype? Authors:Yizhang Zhu, Liangwei Wang, Chenyu Yang, Xiaotian Lin, Boyan Li, Wei Zhou, Xinyu Liu, Zhangyang Peng, Tianqi Luo, Yu Li, Chengliang Chai, Chong Chen, Shimin Di, Ju Fan, Ji Sun, Nan Tang, Fugee Tsung, Jiannan Wang, Chenglin Wu, Yanwei Xu, Shaolei Zhang, Yong Zhang, Xuanhe Zhou, Guoliang Li, Yuyu Luo View a PDF of the paper titled A Survey of Data Agents: Emerging Paradigm or Overstated Hype?, by Yizhang Zhu and 24 other authors View PDF HTML (experimental) Abstract:The rapid advancement of large language models (LLMs) has spurred the emergence of data agents, autonomous systems designed to orchestrate Data + AI ecosystems for tackling complex data-related tasks. However, the term "data agent" currently suffers from terminological ambiguity and inconsistent adoption, conflating simple query responders with sophisticated autonomous architectures. This terminological ambiguity fosters mismatched user expectations, accountability challenges, and barriers to industry growth. Inspired by the SAE J3016 standard for driving automation, this survey introduces the first systematic hierarchical taxonomy for data agents, comprising six levels that delineate and trace progressive shifts in autonomy, from manual operations (L0) to a vision of generative, fully autonomous data agents (L5...