[2604.05134] Reasoning Through Chess: How Reasoning Evolves from Data

[2604.05134] Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning

arXiv - AI April 08, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.05134: Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning

Computer Science > Machine Learning arXiv:2604.05134 (cs) [Submitted on 6 Apr 2026] Title:Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning Authors:Lucas Dionisopoulos, Nicklas Majamaki, Prithviraj Ammanabrolu View a PDF of the paper titled Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning, by Lucas Dionisopoulos and 2 other authors View PDF HTML (experimental) Abstract:How can you get a language model to reason in a task it natively struggles with? We study how reasoning evolves in a language model -- from supervised fine-tuning (SFT) to reinforcement learning (RL) -- by analyzing how a set of theoretically-inspired datasets impacts language model performance in chess. We find that fine-tuning a model to directly predict the best move leads to effective RL and the strongest downstream performance -- however, the RL step elicits unfaithful reasoning (reasoning inconsistent with the chosen move). Alternatively, training on multi-move trajectories yields comparable downstream performance with faithful reasoning and more stable RL. We show that RL induces a substantial positive shift in the distribution of move quality and reduces hallucination rates as a side effect. Finally, we find several SFT-checkpoint metrics -- metrics spanning evaluation performance, hallucination rates, and reasoning quality -- to be predictive of post-RL model performance. We release checkpoin...

Originally published on April 08, 2026. Curated by AI News.

Llms

[2604.16909] PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations

Abstract page for arXiv paper 2604.16909: PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations

arXiv - AI · 4 min · about 2 hours ago

Llms

[2604.07802] Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models

Abstract page for arXiv paper 2604.07802: Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models

arXiv - AI · 4 min · about 2 hours ago

Llms

[2602.07605] Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning

Abstract page for arXiv paper 2602.07605: Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Rea...

arXiv - AI · 4 min · about 2 hours ago

Llms

[2602.07096] RealFin: How Well Do LLMs Reason About Finance When Users Leave Things Unsaid?

Abstract page for arXiv paper 2602.07096: RealFin: How Well Do LLMs Reason About Finance When Users Leave Things Unsaid?

arXiv - AI · 3 min · about 2 hours ago

[2604.05134] Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning

About this article

Related Articles

[2604.16909] PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations

[2604.07802] Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models

[2602.07605] Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning

[2602.07096] RealFin: How Well Do LLMs Reason About Finance When Users Leave Things Unsaid?

No comments

Stay updated with AI News