[2601.03111] One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning

[2601.03111] One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2601.03111: One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning

Computer Science > Machine Learning arXiv:2601.03111 (cs) [Submitted on 6 Jan 2026 (v1), last revised 2 Apr 2026 (this version, v2)] Title:One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning Authors:Yiyuan Li, Zhen Huang, Yanan Wu, Weixun Wang, Xuefeng Li, Yijia Luo, Wenbo Su, Bo Zheng, Pengfei Liu View a PDF of the paper titled One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning, by Yiyuan Li and 8 other authors View PDF HTML (experimental) Abstract:The reasoning ability of large language models (LLMs) can be unleashed with reinforcement learning (RL) (OpenAI, 2024; DeepSeek-AI et al., 2025a; Zeng et al., 2025). The success of existing RL attempts in LLMs usually rely on high-quality samples of large volumes. In this paper, we challenge conventional assumptions about data requirements in RL for LLMs by demonstrating the effectiveness of one-shot reinforcement learning. Specifically, we introduce polymath learning, a framework for designing one training sample that elicits multidisciplinary reasoning improvement. We present three key findings: (1) A single, strategically selected math reasoning sample can produce significant performance improvements across multiple domains, including physics, chemistry, and biology; (2) Analysis of salient mathematical skills provides insight into the characteristics associated with effective polymath samples; and (3) An engi...

Originally published on April 03, 2026. Curated by AI News.

Related Articles

Llms

Intel LLM-Scaler vllm-0.14.0-b8.2 released with official Arc Pro B70 support

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

What was the biggest thing to happen in the field of AI?

I personally think it’s either AlphaGo or ChatGPT. AlphaGo showed to the whole world that AIs can be better than its creators in an area ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Training-time intervention yields 63.4% blind-pair human preference at matched val-loss (1.2B params, 320 judgments, p = 1.98 × 10⁻⁵) [R]

TL;DR. I ran a blind A/B preference evaluation between two 1.2B-parameter LMs trained on identical data (same order, same seed, 30K steps...

Reddit - Machine Learning · 1 min ·
I tried Gemini, ChatGPT, and Claude for a month on Android, and I have a clear winner for you
Llms

I tried Gemini, ChatGPT, and Claude for a month on Android, and I have a clear winner for you

The ultimate Android AI showdown

AI Tools & Products · 5 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime