[2603.01907] Efficient RLVR Training via Weighted Mutual Information Data Selection
About this article
Abstract page for arXiv paper 2603.01907: Efficient RLVR Training via Weighted Mutual Information Data Selection
Computer Science > Machine Learning arXiv:2603.01907 (cs) [Submitted on 2 Mar 2026] Title:Efficient RLVR Training via Weighted Mutual Information Data Selection Authors:Xinyu Zhou, Boyu Zhu, Haotian Zhang, Huiming Wang, Zhijiang Guo View a PDF of the paper titled Efficient RLVR Training via Weighted Mutual Information Data Selection, by Xinyu Zhou and 4 other authors View PDF HTML (experimental) Abstract:Reinforcement learning (RL) plays a central role in improving the reasoning and alignment of large language models, yet its efficiency critically depends on how training data are selected. Existing online selection strategies predominantly rely on difficulty-based heuristics, favouring datapoints with intermediate success rates, implicitly equating difficulty with informativeness and neglecting epistemic uncertainty arising from limited evidence. We introduce InSight, an INformation-guided data SamplInG metHod for RL Training, grounded in a weighted mutual information objective. By modeling data outcomes with Bayesian latent success rates, we show that expected uncertainty reduction decomposes into complementary difficulty- and evidence-dependent components, revealing a fundamental limitation of difficulty-only selection. Leveraging this observation, InSight constructs a stable acquisition score based on the mean belief of datapoints' success rather than noisy sampled outcomes, and naturally extends to multi-rollout settings common in reinforcement learning with verifiable...