[2603.20311] kRAIG: A Natural Language-Driven Agent for Automated DataOps Pipeline Generation
About this article
Abstract page for arXiv paper 2603.20311: kRAIG: A Natural Language-Driven Agent for Automated DataOps Pipeline Generation
Computer Science > Software Engineering arXiv:2603.20311 (cs) [Submitted on 19 Mar 2026] Title:kRAIG: A Natural Language-Driven Agent for Automated DataOps Pipeline Generation Authors:Rohan Siva, Kai Cheung, Lichi Li, Ganesh Sundaram View a PDF of the paper titled kRAIG: A Natural Language-Driven Agent for Automated DataOps Pipeline Generation, by Rohan Siva and 3 other authors View PDF HTML (experimental) Abstract:Modern machine learning systems rely on complex data engineering workflows to extract, transform, and load (ELT) data into production pipelines. However, constructing these pipelines remains time-consuming and requires substantial expertise in data infrastructure and orchestration frameworks. Recent advances in large language model (LLM) agents offer a potential path toward automating these workflows, but existing approaches struggle with under-specified user intent, unreliable tool generation, and limited guarantees of executable outputs. We introduce kRAIG, an AI agent that translates natural language specifications into production-ready Kubeflow Pipelines (KFP). To resolve ambiguity in user intent, we propose ReQuesAct (Reason, Question, Act), an interaction framework that explicitly clarifies intent prior to pipeline synthesis. The system orchestrates end-to-end data movement from diverse sources and generates task-specific transformation components through a retrieval-augmented tool synthesis process. To ensure data quality and safety, kRAIG incorporates LL...