[2505.18116] NFT: Bridging Supervised Learning and Reinforcement

[2505.18116] NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning

arXiv - Machine Learning March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2505.18116: NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning

Computer Science > Machine Learning arXiv:2505.18116 (cs) [Submitted on 23 May 2025 (v1), last revised 1 Mar 2026 (this version, v3)] Title:NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning Authors:Huayu Chen, Kaiwen Zheng, Qinsheng Zhang, Ganqu Cui, Lifan Yuan, Yin Cui, Haotian Ye, Tsung-Yi Lin, Ming-Yu Liu, Jun Zhu, Haoxiang Wang View a PDF of the paper titled NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning, by Huayu Chen and 10 other authors View PDF HTML (experimental) Abstract:Reinforcement Learning (RL) has played a central role in the recent surge of LLMs' math abilities by enabling self-improvement through binary verifier signals. In contrast, Supervised Learning (SL) is rarely considered for such verification-driven training, largely due to its heavy reliance on reference answers and inability to reflect on mistakes. In this work, we challenge the prevailing notion that self-improvement is exclusive to RL and propose Negative-aware Fine-Tuning (NFT) -- a supervised approach that enables LLMs to reflect on their failures and improve autonomously with no external teachers. In online training, instead of throwing away self-generated negative answers, NFT constructs an implicit negative policy to model them. This implicit policy is parameterized with the same positive LLM we target to optimize on positive data, enabling direct policy optimization on all LLMs' generations. We conduct experiments on 7B and 32B ...

Originally published on March 03, 2026. Curated by AI News.

Llms

Claude code x n8n

Hi everyone, I’ve been exploring MCP and integrating tools like n8n with Claude Code, and I’m trying to understand how practical this rea...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

LLM comprehension question

Basically, does anyone else also get a really strange sense of lingering confusion and non-comprehension when an LLM explains a complex c...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

Curated 550+ free AI tools useful for building projects (LLMs, APIs, local models, RAG, agents)

Over the last few days I was collecting free or low cost AI tools that are actually useful if you want to build stuff, not just try rando...

Reddit - Artificial Intelligence · 1 min · about 9 hours ago

Llms

Claude Mythos and misguided open-weight fearmongering

AI Tools & Products · 9 min · about 13 hours ago

[2505.18116] NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning

About this article

Related Articles

Claude code x n8n

LLM comprehension question

Curated 550+ free AI tools useful for building projects (LLMs, APIs, local models, RAG, agents)

Claude Mythos and misguided open-weight fearmongering

No comments

Stay updated with AI News