[2602.16742] DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning
Summary
DeepVision-103K introduces a comprehensive dataset designed to enhance reinforcement learning with verifiable rewards, significantly improving multimodal reasoning in large models.
Why It Matters
This dataset addresses the limitations of existing resources by providing a diverse and extensive collection of K12 mathematical topics, which is crucial for advancing the capabilities of large multimodal models in visual reasoning tasks. Its development is timely as the demand for robust AI models continues to grow in educational and research contexts.
Key Takeaways
- DeepVision-103K enhances visual reflection and reasoning in AI models.
- The dataset covers a wide range of K12 mathematical topics and knowledge points.
- Models trained on DeepVision show improved performance on multimodal benchmarks.
- The dataset is crucial for advancing reinforcement learning with verifiable rewards.
- DeepVision's effectiveness is validated through extensive analysis of trained models.
Computer Science > Machine Learning arXiv:2602.16742 (cs) [Submitted on 18 Feb 2026] Title:DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning Authors:Haoxiang Sun, Lizhen Xu, Bing Zhao, Wotao Yin, Wei Wang, Boyu Yang, Rui Wang, Hu Wei View a PDF of the paper titled DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning, by Haoxiang Sun and 7 other authors View PDF HTML (experimental) Abstract:Reinforcement Learning with Verifiable Rewards (RLVR) has been shown effective in enhancing the visual reflection and reasoning capabilities of Large Multimodal Models (LMMs). However, existing datasets are predominantly derived from either small-scale manual construction or recombination of prior resources, which limits data diversity and coverage, thereby constraining further gains in model performance. To this end, we introduce \textbf{DeepVision-103K}, a comprehensive dataset for RLVR training that covers diverse K12 mathematical topics, extensive knowledge points, and rich visual elements. Models trained on DeepVision achieve strong performance on multimodal mathematical benchmarks, and generalize effectively to general multimodal reasoning tasks. Further analysis reveals enhanced visual perception, reflection and reasoning capabilities in trained models, validating DeepVision's effectiveness for advancing multimodal reasoning. Data: \href{this https URL}{this...