[2512.18857] CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning

[2512.18857] CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2512.18857: CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning

Computer Science > Artificial Intelligence arXiv:2512.18857 (cs) [Submitted on 21 Dec 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning Authors:Zijun Gao, Zhikun Xu, Xiao Ye, Ben Zhou View a PDF of the paper titled CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning, by Zijun Gao and 3 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) often solve challenging math exercises yet fail to apply the concept right when the problem requires genuine understanding. Popular Reinforcement Learning with Verifiable Rewards (RLVR) pipelines reinforce final answers but provide little fine-grained conceptual signal, so models improve at pattern reuse rather than conceptual applications. We introduce CORE (Concept-Oriented REinforcement), an RL training framework that turns explicit concepts into a controllable supervision signal. Starting from a high-quality, low-contamination textbook resource that links verifiable exercises to concise concept descriptions, we run a sanity probe showing LLMs can restate definitions but fail concept-linked quizzes, quantifying the conceptual reasoning gap. CORE then (i) synthesizes concept-aligned quizzes, (ii) injects brief concept snippets during rollouts to elicit concept-primed trajectories, and (iii) reinforces conceptual reasoning via trajectory rep...

Originally published on March 04, 2026. Curated by AI News.

Related Articles

Llms

persistent memory system for AI agents — single SQLite file, no external server, no API keys. free and opensource - BrainCTL

Every agent I build forgets everything between sessions. I got tired of it and built brainctl. pip install brainctl, then: from agentmemo...

Reddit - Artificial Intelligence · 1 min ·
Llms

How has Claude far surpassed the competitors? They were not first to market or ever had the most cash yet their feature are far and away the best on the market.

How has Claude far surpassed the competitors? They were not first to market or ever had the most cash yet their feature are far and away ...

Reddit - Artificial Intelligence · 1 min ·
Anthropic temporarily banned OpenClaw's creator from accessing Claude | TechCrunch
Llms

Anthropic temporarily banned OpenClaw's creator from accessing Claude | TechCrunch

This ban took place after Claude's pricing changed for OpenClaw users last week.

TechCrunch - AI · 5 min ·
Llms

I probably shouldn't be impressed, but I am.

So I just made this workout on a whiteboard and I was feeling lazy so I asked Claude to read it. And it did, almost flawlessly. I was and...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime