[2601.15160] Knowledge Graphs are Implicit Reward Models: Path-Derived

[2601.15160] Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning

arXiv - AI March 05, 2026 4 min read

About this article

Abstract page for arXiv paper 2601.15160: Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning

Computer Science > Artificial Intelligence arXiv:2601.15160 (cs) [Submitted on 21 Jan 2026 (v1), last revised 4 Mar 2026 (this version, v2)] Title:Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning Authors:Yuval Kansal, Niraj K. Jha View a PDF of the paper titled Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning, by Yuval Kansal and 1 other authors View PDF HTML (experimental) Abstract:Large language models have achieved near-expert performance in structured reasoning domains like mathematics and programming, yet their ability to perform compositional multi-hop reasoning in specialized scientific fields remains limited. We propose a bottom-up learning paradigm in which models are grounded in axiomatic domain facts and compose them to solve complex, unseen tasks. To this end, we present a post-training pipeline, based on a combination of supervised fine-tuning and reinforcement learning (RL), in which knowledge graphs act as implicit reward models. By deriving novel reward signals from knowledge graph paths, we provide verifiable, scalable, and grounded supervision that encourages models to compose intermediate axioms rather than optimize only final answers during RL. We validate this approach in the medical domain, training a 14B model on short-hop reasoning paths (1-3 hops) and evaluating its zero-shot generalization to complex multi-hop queries (4-5 hops). Our experiments show th...

Originally published on March 05, 2026. Curated by AI News.

Llms

Florida's attorney general launches probe into Open AI, Chat GPT

AI Tools & Products · 1 min · about 1 hour ago

Llms

The Gemini app can now generate interactive simulations and models.

AI Tools & Products · 1 min · about 1 hour ago

Llms

AI on the couch: Anthropic gives Claude 20 hours of psychiatry

AI Tools & Products · 6 min · about 1 hour ago

Llms

Moody’s Integrates AI Agents With Anthropic’s Claude

AI Tools & Products · 4 min · about 1 hour ago

[2601.15160] Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning

About this article

Related Articles

Florida's attorney general launches probe into Open AI, Chat GPT

The Gemini app can now generate interactive simulations and models.

AI on the couch: Anthropic gives Claude 20 hours of psychiatry

Moody’s Integrates AI Agents With Anthropic’s Claude

No comments

Stay updated with AI News