[2604.04872] Synthetic Sandbox for Training Machine Learning Engineering Agents
About this article
Abstract page for arXiv paper 2604.04872: Synthetic Sandbox for Training Machine Learning Engineering Agents
Computer Science > Computation and Language arXiv:2604.04872 (cs) [Submitted on 6 Apr 2026] Title:Synthetic Sandbox for Training Machine Learning Engineering Agents Authors:Yuhang Zhou, Lizhu Zhang, Yifan Wu, Jiayi Liu, Xiangjun Fan, Zhuokai Zhao, Hong Yan View a PDF of the paper titled Synthetic Sandbox for Training Machine Learning Engineering Agents, by Yuhang Zhou and 6 other authors View PDF HTML (experimental) Abstract:As large language model agents advance beyond software engineering (SWE) tasks toward machine learning engineering (MLE), verifying agent behavior becomes orders of magnitude more expensive: while SWE tasks can be verified via fast-executing unit tests, MLE verification requires running full ML pipelines -- data preprocessing, model training, and metric evaluation -- on large datasets at each rollout step, rendering trajectory-wise on-policy reinforcement learning (RL) prohibitively slow. Existing approaches retreat to supervised fine-tuning (SFT) or offline proxy rewards, sacrificing the exploration and generalization benefits of on-policy RL. We observe that sandbox data size is the primary source of this bottleneck. Based on this insight, we introduce SandMLE, a multi-agent framework that generates diverse, verifiable synthetic MLE environments from a small number of seed tasks, preserving the structural and technical complexity of real-world problems while constraining datasets to micro-scale (each task is paired with only 50-200 training samples)....