[2604.05134] Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning

[2604.05134] Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2604.05134: Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning

Computer Science > Machine Learning arXiv:2604.05134 (cs) [Submitted on 6 Apr 2026] Title:Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning Authors:Lucas Dionisopoulos, Nicklas Majamaki, Prithviraj Ammanabrolu View a PDF of the paper titled Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning, by Lucas Dionisopoulos and 2 other authors View PDF HTML (experimental) Abstract:How can you get a language model to reason in a task it natively struggles with? We study how reasoning evolves in a language model -- from supervised fine-tuning (SFT) to reinforcement learning (RL) -- by analyzing how a set of theoretically-inspired datasets impacts language model performance in chess. We find that fine-tuning a model to directly predict the best move leads to effective RL and the strongest downstream performance -- however, the RL step elicits unfaithful reasoning (reasoning inconsistent with the chosen move). Alternatively, training on multi-move trajectories yields comparable downstream performance with faithful reasoning and more stable RL. We show that RL induces a substantial positive shift in the distribution of move quality and reduces hallucination rates as a side effect. Finally, we find several SFT-checkpoint metrics -- metrics spanning evaluation performance, hallucination rates, and reasoning quality -- to be predictive of post-RL model performance. We release checkpoin...

Originally published on April 08, 2026. Curated by AI News.

Related Articles

[2604.16909] PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations
Llms

[2604.16909] PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations

Abstract page for arXiv paper 2604.16909: PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations

arXiv - AI · 4 min ·
[2604.07802] Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models
Llms

[2604.07802] Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models

Abstract page for arXiv paper 2604.07802: Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models

arXiv - AI · 4 min ·
[2602.07605] Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
Llms

[2602.07605] Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning

Abstract page for arXiv paper 2602.07605: Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Rea...

arXiv - AI · 4 min ·
[2602.07096] RealFin: How Well Do LLMs Reason About Finance When Users Leave Things Unsaid?
Llms

[2602.07096] RealFin: How Well Do LLMs Reason About Finance When Users Leave Things Unsaid?

Abstract page for arXiv paper 2602.07096: RealFin: How Well Do LLMs Reason About Finance When Users Leave Things Unsaid?

arXiv - AI · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime