[2602.11549] Native Reasoning Models: Training Language Models to Reason on Unverifiable Data
About this article
Abstract page for arXiv paper 2602.11549: Native Reasoning Models: Training Language Models to Reason on Unverifiable Data
Computer Science > Machine Learning arXiv:2602.11549 (cs) [Submitted on 12 Feb 2026 (v1), last revised 23 Mar 2026 (this version, v2)] Title:Native Reasoning Models: Training Language Models to Reason on Unverifiable Data Authors:Yuanfu Wang, Zhixuan Liu, Xiangtian Li, Chaochao Lu, Chao Yang View a PDF of the paper titled Native Reasoning Models: Training Language Models to Reason on Unverifiable Data, by Yuanfu Wang and 4 other authors View PDF HTML (experimental) Abstract:The prevailing paradigm for training large reasoning models--combining Supervised Fine-Tuning (SFT) with Reinforcement Learning with Verifiable Rewards (RLVR)--is fundamentally constrained by its reliance on high-quality, human-annotated reasoning data and external verifiers. This dependency incurs significant data-collection costs, risks embedding human cognitive biases, and confines the reinforcement learning stage to objectively assessable domains like mathematics and coding, leaving a wide range of unverifiable tasks beyond its scope. To overcome these limitations, we introduce NRT (Native Reasoning Training), a novel framework that cultivates complex reasoning by having the model generate its own reasoning traces using only standard question-answer pairs, thereby obviating the need for expert-written demonstrations. NRT reframes the training problem by treating the reasoning process as a latent variable. It employs a unified training objective that models reasoning as an optimization problem, intri...