[2604.03647] Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling
About this article
Abstract page for arXiv paper 2604.03647: Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling
Computer Science > Computer Vision and Pattern Recognition arXiv:2604.03647 (cs) [Submitted on 4 Apr 2026] Title:Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling Authors:Yunyao Yu, Zhengxian Wu, Zhuohong Chen, Hangrui Xu, Zirui Liao, Xiangwen Deng, Zhifang Liu, Senyuan Shi, Haoqian Wang View a PDF of the paper titled Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling, by Yunyao Yu and 8 other authors View PDF HTML (experimental) Abstract:In the unsupervised self-evolution of Multimodal Large Language Models, the quality of feedback signals during post-training is pivotal for stable and effective learning. However, existing self-evolution methods predominantly rely on majority voting to select the most frequent output as the pseudo-golden answer, which may stem from the model's intrinsic biases rather than guaranteeing the objective correctness of the reasoning paths. To counteract the degradation, we propose \textbf{C}ontinuous \textbf{S}oftened \textbf{R}etracing re\textbf{S}ampling (\textbf{CSRS}) in MLLM self-evolution. Specifically, we introduce a Retracing Re-inference Mechanism (\textbf{RRM}) that the model re-inferences from anchor points to expand the exploration of long-tail reasoning paths. Simultaneously, we propose Softened Frequency Reward (\textbf{SFR}), which replaces binary rewards with continuous signals, calibrating reward based on the answers' frequency across sampled...