[2505.19590] Learning to Reason without External Rewards

arXiv - Machine Learning March 03, 2026 3 min read

About this article

Abstract page for arXiv paper 2505.19590: Learning to Reason without External Rewards

Computer Science > Machine Learning arXiv:2505.19590 (cs) [Submitted on 26 May 2025 (v1), last revised 2 Mar 2026 (this version, v3)] Title:Learning to Reason without External Rewards Authors:Xuandong Zhao, Zhewei Kang, Aosong Feng, Sergey Levine, Dawn Song View a PDF of the paper titled Learning to Reason without External Rewards, by Xuandong Zhao and 4 other authors View PDF HTML (experimental) Abstract:Training large language models (LLMs) for complex reasoning via Reinforcement Learning with Verifiable Rewards (RLVR) is effective but limited by reliance on costly, domain-specific supervision. We explore Reinforcement Learning from Internal Feedback (RLIF), a framework that enables LLMs to learn from intrinsic signals without external rewards or labeled data. We propose Intuitor, an RLIF method that uses a model's own confidence-termed self-certainty-as its sole reward signal. Intuitor replaces external rewards in Group Relative Policy Optimization (GRPO) with self-certainty scores, enabling fully unsupervised learning. Experiments demonstrate that Intuitor matches GRPO's performance on mathematical benchmarks while achieving better generalization to out-of-domain tasks like code generation, without requiring gold solutions or test cases. Our findings show that intrinsic model signals can drive effective learning across domains, offering a scalable alternative to RLVR for autonomous AI systems where verifiable rewards are unavailable. Code is available at this https URL...

Originally published on March 03, 2026. Curated by AI News.

Llms

Claude code x n8n

Hi everyone, I’ve been exploring MCP and integrating tools like n8n with Claude Code, and I’m trying to understand how practical this rea...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

LLM comprehension question

Basically, does anyone else also get a really strange sense of lingering confusion and non-comprehension when an LLM explains a complex c...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

Curated 550+ free AI tools useful for building projects (LLMs, APIs, local models, RAG, agents)

Over the last few days I was collecting free or low cost AI tools that are actually useful if you want to build stuff, not just try rando...

Reddit - Artificial Intelligence · 1 min · about 9 hours ago

Llms

Claude Mythos and misguided open-weight fearmongering

AI Tools & Products · 9 min · about 12 hours ago

[2505.19590] Learning to Reason without External Rewards

About this article

Related Articles

Claude code x n8n

LLM comprehension question

Curated 550+ free AI tools useful for building projects (LLMs, APIs, local models, RAG, agents)

Claude Mythos and misguided open-weight fearmongering

No comments

Stay updated with AI News