[2603.21289] When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning
About this article
Abstract page for arXiv paper 2603.21289: When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.21289 (cs) [Submitted on 22 Mar 2026] Title:When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning Authors:Zhengxian Wu, Kai Shi, Chuanrui Zhang, Zirui Liao, Jun Yang, Ni Yang, Qiuying Peng, Luyuan Zhang, Hangrui Xu, Tianhuang Su, Zhenyu Yang, Haonan Lu, Haoqian Wang View a PDF of the paper titled When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning, by Zhengxian Wu and 12 other authors View PDF HTML (experimental) Abstract:Recent progress in multimodal large language models has led to strong performance on reasoning tasks, but these improvements largely rely on high-quality annotated data or teacher-model distillation, both of which are costly and difficult to this http URL address this, we propose an unsupervised self-evolution training framework for multimodal reasoning that achieves stable performance improvements without using human-annotated answers or external reward models. For each input, we sample multiple reasoning trajectories and jointly model their within group this http URL use the Actor's self-consistency signal as a training prior, and introduce a bounded Judge based modulation to continuously reweight trajectories of different this http URL further model the modulated scores as a group level distribution and convert absolute scores into relative advantages within each group, enabling more robust policy updates. Trained with Group...