[2603.22435] CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation
About this article
Abstract page for arXiv paper 2603.22435: CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation
Computer Science > Robotics arXiv:2603.22435 (cs) [Submitted on 23 Mar 2026] Title:CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation Authors:Max Fu, Justin Yu, Karim El-Refai, Ethan Kou, Haoru Xue, Huang Huang, Wenli Xiao, Guanzhi Wang, Fei-Fei Li, Guanya Shi, Jiajun Wu, Shankar Sastry, Yuke Zhu, Ken Goldberg, Linxi "Jim" Fan View a PDF of the paper titled CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation, by Max Fu and 14 other authors View PDF HTML (experimental) Abstract:"Code-as-Policy" considers how executable code can complement data-intensive Vision-Language-Action (VLA) methods, yet their effectiveness as autonomous controllers for embodied manipulation remains underexplored. We present CaP-X, an open-access framework for systematically studying Code-as-Policy agents in robot manipulation. At its core is CaP-Gym, an interactive environment in which agents control robots by synthesizing and executing programs that compose perception and control primitives. Building on this foundation, CaP-Bench evaluates frontier language and vision-language models across varying levels of abstraction, interaction, and perceptual grounding. Across 12 models, CaP-Bench reveals a consistent trend: performance improves with human-crafted abstractions but degrades as these priors are removed, exposing a dependence on designer scaffolding. At the same time, we observe that this gap can be mitigated through scaling ...