[2604.06628] Rethinking Generalization in Reasoning SFT: A Conditional

[2604.06628] Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

arXiv - AI April 09, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.06628: Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Computer Science > Artificial Intelligence arXiv:2604.06628 (cs) [Submitted on 8 Apr 2026] Title:Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Authors:Qihan Ren, Peng Wang, Ruikun Cai, Shuai Shao, Dadi Guo, Yuejin Xie, Yafu Li, Quanshi Zhang, Xia Hu, Jing Shao, Dongrui Liu View a PDF of the paper titled Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability, by Qihan Ren and 10 other authors View PDF Abstract:A prevailing narrative in LLM post-training holds that supervised finetuning (SFT) memorizes while reinforcement learning (RL) generalizes. We revisit this claim for reasoning SFT with long chain-of-thought (CoT) supervision and find that cross-domain generalization is not absent but conditional, jointly shaped by optimization dynamics, training data, and base-model capability. Some reported failures are under-optimization artifacts: cross-domain performance first degrades before recovering and improving with extended training (a dip-and-recovery pattern), so shorttraining checkpoints can underestimate generalization. Data quality and structure both matter: low-quality solutions broadly hurt generalization,while verified long-CoT traces yield consistent cross-domain gains. Model capability is essential: stronger models internalize transferable procedural patterns (e.g., backtracking) even from a toy arithmetic game, while weaker ones imitate surface...

Originally published on April 09, 2026. Curated by AI News.

Llms

We gave 45 psychological questionnaires to 50 LLMs. What we found was not “personality.”

What is the “personality” of an LLM? What actually differentiates models psychometrically? Since LLMs entered public use, researchers hav...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

How to Disable Google's Gemini in Chrome | WIRED

Chrome users were caught off guard by a 4-GB Google AI model baked into Chrome, sparking privacy concerns. The good news: You can easily ...

Wired - AI · 6 min · about 2 hours ago

Llms

OpenAI introduces new 'Trusted Contact' safeguard for cases of possible self-harm | TechCrunch

The company is expanding its efforts to protect ChatGPT users in cases where conversations may turn to self-harm.

TechCrunch - AI · 5 min · about 3 hours ago

Llms

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

Thanks to Musk v. Altman, the public is getting a concrete look at details of Sam Altman’s ouster from OpenAI, much of it centered on for...

The Verge - AI · 11 min · about 4 hours ago

[2604.06628] Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

About this article

Related Articles

We gave 45 psychological questionnaires to 50 LLMs. What we found was not “personality.”

How to Disable Google's Gemini in Chrome | WIRED

OpenAI introduces new 'Trusted Contact' safeguard for cases of possible self-harm | TechCrunch

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

No comments

Stay updated with AI News