[2512.01822] InnoGym: Benchmarking the Innovation Potential of AI Agents
About this article
Abstract page for arXiv paper 2512.01822: InnoGym: Benchmarking the Innovation Potential of AI Agents
Computer Science > Computation and Language arXiv:2512.01822 (cs) [Submitted on 1 Dec 2025 (v1), last revised 28 Feb 2026 (this version, v2)] Title:InnoGym: Benchmarking the Innovation Potential of AI Agents Authors:Jintian Zhang, Kewei Xu, Jingsheng Zheng, Zhuoyun Yu, Yuqi Zhu, Yujie Luo, Lanning Wei, Shuofei Qiao, Lun Du, Da Zheng, Shumin Deng, Huajun Chen, Ningyu Zhang View a PDF of the paper titled InnoGym: Benchmarking the Innovation Potential of AI Agents, by Jintian Zhang and 12 other authors View PDF HTML (experimental) Abstract:LLMs and Agents have achieved impressive progress in code generation, mathematical reasoning, and scientific discovery. However, existing benchmarks primarily measure correctness, overlooking the diversity of methods behind solutions. True innovation depends not only on producing correct answers but also on the originality of the approach. We present InnoGym, the first benchmark and framework designed to systematically evaluate the innovation potential of AI agents. InnoGym introduces two complementary metrics: performance gain, which measures improvement over the best-known solutions, and novelty, which captures methodological differences from prior approaches. The benchmark includes 18 carefully curated tasks from real-world engineering and scientific domains, each standardized through resource filtering, evaluator validation, and solution collection. In addition, we provide iGym, a unified execution environment for reproducible and long-...