[2604.06628] Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability
About this article
Abstract page for arXiv paper 2604.06628: Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability
Computer Science > Artificial Intelligence arXiv:2604.06628 (cs) [Submitted on 8 Apr 2026] Title:Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Authors:Qihan Ren, Peng Wang, Ruikun Cai, Shuai Shao, Dadi Guo, Yuejin Xie, Yafu Li, Quanshi Zhang, Xia Hu, Jing Shao, Dongrui Liu View a PDF of the paper titled Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability, by Qihan Ren and 10 other authors View PDF Abstract:A prevailing narrative in LLM post-training holds that supervised finetuning (SFT) memorizes while reinforcement learning (RL) generalizes. We revisit this claim for reasoning SFT with long chain-of-thought (CoT) supervision and find that cross-domain generalization is not absent but conditional, jointly shaped by optimization dynamics, training data, and base-model capability. Some reported failures are under-optimization artifacts: cross-domain performance first degrades before recovering and improving with extended training (a dip-and-recovery pattern), so shorttraining checkpoints can underestimate generalization. Data quality and structure both matter: low-quality solutions broadly hurt generalization,while verified long-CoT traces yield consistent cross-domain gains. Model capability is essential: stronger models internalize transferable procedural patterns (e.g., backtracking) even from a toy arithmetic game, while weaker ones imitate surface...