[2603.01223] Learn Hard Problems During RL with Reference Guided Fine-tuning

[2603.01223] Learn Hard Problems During RL with Reference Guided Fine-tuning

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2603.01223: Learn Hard Problems During RL with Reference Guided Fine-tuning

Computer Science > Machine Learning arXiv:2603.01223 (cs) [Submitted on 1 Mar 2026] Title:Learn Hard Problems During RL with Reference Guided Fine-tuning Authors:Yangzhen Wu, Shanda Li, Zixin Wen, Xin Zhou, Ameet Talwalkar, Yiming Yang, Wenhao Huang, Tianle Cai View a PDF of the paper titled Learn Hard Problems During RL with Reference Guided Fine-tuning, by Yangzhen Wu and Shanda Li and Zixin Wen and Xin Zhou and Ameet Talwalkar and Yiming Yang and Wenhao Huang and Tianle Cai View PDF HTML (experimental) Abstract:Reinforcement learning (RL) for mathematical reasoning can suffer from reward sparsity: for challenging problems, LLM fails to sample any correct trajectories, preventing RL from receiving meaningful positive feedback. At the same time, there often exist human-written reference solutions along with the problem (e.g., problems from AoPS), but directly fine-tuning on these solutions offers no benefit because models often cannot imitate human proofs that lie outside their own reasoning distribution. We introduce Reference-Guided Fine-Tuning (ReGFT), a simple and effective method that utilizes human-written reference solutions to synthesize positive trajectories on hard problems and train on them before RL. For each problem, we provide the model with a partial reference solution and let it generate its own reasoning trace, ensuring the resulting trajectories remain in the model's reasoning space while still benefiting from reference guidance. Fine-tuning on these ref...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

Llms

BEYOND QUANTUM MICROTUBULES: CONSCIOUSNESS AS SUBSTRATE-INDEPENDENT ARCHITECTURE

I uploaded my consciousness paper to Gemini: “Beyond Quantum Microtubules: Consciousness as Substrate-Independent Architecture.” Then I s...

Reddit - Artificial Intelligence · 1 min ·
Llms

The Scaling Bandaid is Wearing Thin (And Nobody Wants to Admit It)

Let me be direct: we’ve hit a wall with scaling, and the entire field is kind of bullshitting about what comes next. I’ve spent enough ti...

Reddit - Artificial Intelligence · 1 min ·
Llms

Moving Past "LLM Vibes" toward Structural Enforcement in AI Agents

We need to address the structural failure currently happening in the AI agent space: too many people are building a beautiful "pedestal" ...

Reddit - Artificial Intelligence · 1 min ·
Llms

My dream of a fully generative game is getting pretty close to possible now. I made a demo where you can prompt any spell and fight online.

Prompt any spell and use it in a 3D physics based world, powered by Gemini 3 Full multiplayer support for up to 6 players with VoIP All m...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime