[2601.03111] One Sample to Rule Them All: Extreme Data Efficiency in

[2601.03111] One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning

arXiv - Machine Learning April 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2601.03111: One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning

Computer Science > Machine Learning arXiv:2601.03111 (cs) [Submitted on 6 Jan 2026 (v1), last revised 2 Apr 2026 (this version, v2)] Title:One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning Authors:Yiyuan Li, Zhen Huang, Yanan Wu, Weixun Wang, Xuefeng Li, Yijia Luo, Wenbo Su, Bo Zheng, Pengfei Liu View a PDF of the paper titled One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning, by Yiyuan Li and 8 other authors View PDF HTML (experimental) Abstract:The reasoning ability of large language models (LLMs) can be unleashed with reinforcement learning (RL) (OpenAI, 2024; DeepSeek-AI et al., 2025a; Zeng et al., 2025). The success of existing RL attempts in LLMs usually rely on high-quality samples of large volumes. In this paper, we challenge conventional assumptions about data requirements in RL for LLMs by demonstrating the effectiveness of one-shot reinforcement learning. Specifically, we introduce polymath learning, a framework for designing one training sample that elicits multidisciplinary reasoning improvement. We present three key findings: (1) A single, strategically selected math reasoning sample can produce significant performance improvements across multiple domains, including physics, chemistry, and biology; (2) Analysis of salient mathematical skills provides insight into the characteristics associated with effective polymath samples; and (3) An engi...

Originally published on April 03, 2026. Curated by AI News.

Llms

Intel LLM-Scaler vllm-0.14.0-b8.2 released with official Arc Pro B70 support

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

What was the biggest thing to happen in the field of AI?

I personally think it’s either AlphaGo or ChatGPT. AlphaGo showed to the whole world that AIs can be better than its creators in an area ...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

Training-time intervention yields 63.4% blind-pair human preference at matched val-loss (1.2B params, 320 judgments, p = 1.98 × 10⁻⁵) [R]

TL;DR. I ran a blind A/B preference evaluation between two 1.2B-parameter LMs trained on identical data (same order, same seed, 30K steps...

Reddit - Machine Learning · 1 min · about 3 hours ago

Llms

I tried Gemini, ChatGPT, and Claude for a month on Android, and I have a clear winner for you

The ultimate Android AI showdown

AI Tools & Products · 5 min · about 8 hours ago

[2601.03111] One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning

About this article

Related Articles

Intel LLM-Scaler vllm-0.14.0-b8.2 released with official Arc Pro B70 support

What was the biggest thing to happen in the field of AI?

Training-time intervention yields 63.4% blind-pair human preference at matched val-loss (1.2B params, 320 judgments, p = 1.98 × 10⁻⁵) [R]

I tried Gemini, ChatGPT, and Claude for a month on Android, and I have a clear winner for you

No comments

Stay updated with AI News