[2509.21825] DS-STAR: Data Science Agent for Solving Diverse Tasks across Heterogeneous Formats and Open-Ended Queries
Summary
The paper introduces DS-STAR, a data science agent designed to automate complex workflows by integrating diverse data formats and generating comprehensive reports for open-ended queries, outperforming existing models in various benchmarks.
Why It Matters
As data science becomes increasingly complex, tools like DS-STAR are crucial for automating tasks that require the synthesis of information from multiple sources. This advancement can significantly enhance productivity and accuracy in data analysis, making it relevant for researchers and practitioners in AI and data science.
Key Takeaways
- DS-STAR effectively integrates data from heterogeneous formats.
- It generates comprehensive reports for open-ended queries, surpassing existing models.
- The agent shows state-of-the-art performance on multiple benchmarks.
- It excels particularly in complex QA tasks requiring multi-file processing.
- Over 88% of evaluations preferred DS-STAR's report quality over baseline models.
Computer Science > Artificial Intelligence arXiv:2509.21825 (cs) [Submitted on 26 Sep 2025 (v1), last revised 24 Feb 2026 (this version, v4)] Title:DS-STAR: Data Science Agent for Solving Diverse Tasks across Heterogeneous Formats and Open-Ended Queries Authors:Jaehyun Nam, Jinsung Yoon, Jiefeng Chen, Raj Sinha, Jinwoo Shin, Tomas Pfister View a PDF of the paper titled DS-STAR: Data Science Agent for Solving Diverse Tasks across Heterogeneous Formats and Open-Ended Queries, by Jaehyun Nam and 5 other authors View PDF HTML (experimental) Abstract:While large language models (LLMs) have shown promise in automating data science, existing agents often struggle with the complexity of real-world workflows that require exploring multiple sources and synthesizing open-ended insights. In this paper, we introduce DS-STAR, a specialized agent to bridge this gap. Unlike prior approaches, DS-STAR is designed to (1) seamlessly process and integrate data across diverse, heterogeneous formats, and (2) move beyond simple QA to generate comprehensive research reports for open-ended queries. Extensive evaluation shows that DS-STAR achieves state-of-the-art performance on four benchmarks: DABStep, DABStep-Research, KramaBench, and DA-Code. Most notably, it significantly outperforms existing baseline models especially in hard-level QA tasks requiring multi-file processing, and generates high-quality data science reports that are preferred over the best baseline model in over 88% of cases. Subj...