[2601.03111] One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning
About this article
Abstract page for arXiv paper 2601.03111: One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning
Computer Science > Machine Learning arXiv:2601.03111 (cs) [Submitted on 6 Jan 2026 (v1), last revised 2 Apr 2026 (this version, v2)] Title:One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning Authors:Yiyuan Li, Zhen Huang, Yanan Wu, Weixun Wang, Xuefeng Li, Yijia Luo, Wenbo Su, Bo Zheng, Pengfei Liu View a PDF of the paper titled One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning, by Yiyuan Li and 8 other authors View PDF HTML (experimental) Abstract:The reasoning ability of large language models (LLMs) can be unleashed with reinforcement learning (RL) (OpenAI, 2024; DeepSeek-AI et al., 2025a; Zeng et al., 2025). The success of existing RL attempts in LLMs usually rely on high-quality samples of large volumes. In this paper, we challenge conventional assumptions about data requirements in RL for LLMs by demonstrating the effectiveness of one-shot reinforcement learning. Specifically, we introduce polymath learning, a framework for designing one training sample that elicits multidisciplinary reasoning improvement. We present three key findings: (1) A single, strategically selected math reasoning sample can produce significant performance improvements across multiple domains, including physics, chemistry, and biology; (2) Analysis of salient mathematical skills provides insight into the characteristics associated with effective polymath samples; and (3) An engi...