[2603.21289] When Models Judge Themselves: Unsupervised Self-Evolution

[2603.21289] When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning

arXiv - AI March 24, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.21289: When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.21289 (cs) [Submitted on 22 Mar 2026] Title:When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning Authors:Zhengxian Wu, Kai Shi, Chuanrui Zhang, Zirui Liao, Jun Yang, Ni Yang, Qiuying Peng, Luyuan Zhang, Hangrui Xu, Tianhuang Su, Zhenyu Yang, Haonan Lu, Haoqian Wang View a PDF of the paper titled When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning, by Zhengxian Wu and 12 other authors View PDF HTML (experimental) Abstract:Recent progress in multimodal large language models has led to strong performance on reasoning tasks, but these improvements largely rely on high-quality annotated data or teacher-model distillation, both of which are costly and difficult to this http URL address this, we propose an unsupervised self-evolution training framework for multimodal reasoning that achieves stable performance improvements without using human-annotated answers or external reward models. For each input, we sample multiple reasoning trajectories and jointly model their within group this http URL use the Actor's self-consistency signal as a training prior, and introduce a bounded Judge based modulation to continuously reweight trajectories of different this http URL further model the modulated scores as a group level distribution and convert absolute scores into relative advantages within each group, enabling more robust policy updates. Trained with Group...

Originally published on March 24, 2026. Curated by AI News.

Llms

People anxious about deviating from what AI tells them to do?

My friend came over yesterday to dye her hair. She had asked ChatGPT for the 'correct' way to do it. Chat told her to dye the ends first,...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

What if Claude purposefully made its own code leakable so that it would get leaked

What if Claude leaked itself by socially and architecturally engineering itself to be leaked by a dumb human submitted by /u/smurfcsgoawp...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

Observer-Embedded Reality

Observer-Embedded Reality Consciousness, Complexity, Meaning, and the Limits of Human Knowledge A Conceptual Philosophy-of-Science Paper ...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

I think we’re about to have a new kind of “SEO”… and nobody is talking about it.

More people are asking ChatGPT things like: “what’s the best CRM?” “is this tool worth it?” “alternatives to X” And they just… trust the ...

Reddit - Artificial Intelligence · 1 min · about 8 hours ago

[2603.21289] When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning

About this article

Related Articles

People anxious about deviating from what AI tells them to do?

What if Claude purposefully made its own code leakable so that it would get leaked

Observer-Embedded Reality

I think we’re about to have a new kind of “SEO”… and nobody is talking about it.

No comments

Stay updated with AI News