[2603.03589] stratum: A System Infrastructure for Massive Agent-Centric ML Workloads
About this article
Abstract page for arXiv paper 2603.03589: stratum: A System Infrastructure for Massive Agent-Centric ML Workloads
Computer Science > Databases arXiv:2603.03589 (cs) [Submitted on 3 Mar 2026] Title:stratum: A System Infrastructure for Massive Agent-Centric ML Workloads Authors:Arnab Phani, Elias Strauss, Sebastian Schelter View a PDF of the paper titled stratum: A System Infrastructure for Massive Agent-Centric ML Workloads, by Arnab Phani and 2 other authors View PDF HTML (experimental) Abstract:Recent advances in large language models (LLMs) transform how machine learning (ML) pipelines are developed and evaluated. LLMs enable a new type of workload, agentic pipeline search, in which autonomous or semi-autonomous agents generate, validate, and optimize complete ML pipelines. These agents predominantly operate over popular Python ML libraries and exhibit highly exploratory behavior. This results in thousands of executions for data profiling, pipeline generation, and iterative refinement of pipeline stages. However, the existing Python-based ML ecosystem is built around libraries such as Pandas and scikit-learn, which are designed for human-centric, interactive, sequential workflows and remain constrained by Python's interpretive execution model, library-level isolation, and limited runtime support for executing large numbers of pipelines. Meanwhile, many high-performance ML systems proposed by the systems community either target narrow workload classes or require specialized programming models, which limits their integration with the Python ML ecosystem and makes them largely ill-suit...