[2509.25541] Vision-Zero: Scalable VLM Self-Improvement via Strategic

[2509.25541] Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

arXiv - AI March 05, 2026 4 min read

About this article

Abstract page for arXiv paper 2509.25541: Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

Computer Science > Computer Vision and Pattern Recognition arXiv:2509.25541 (cs) [Submitted on 29 Sep 2025 (v1), last revised 4 Mar 2026 (this version, v2)] Title:Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play Authors:Qinsi Wang, Bo Liu, Tianyi Zhou, Jing Shi, Yueqian Lin, Yiran Chen, Hai Helen Li, Kun Wan, Wentian Zhao View a PDF of the paper titled Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play, by Qinsi Wang and 8 other authors View PDF HTML (experimental) Abstract:Although reinforcement learning (RL) has emerged as a promising approach for improving vision-language models (VLMs) and multimodal large language models (MLLMs), current methods rely heavily on manually curated datasets and costly human verification, which limits scalable self-improvement in multimodal systems. To address this challenge, we propose Vision-Zero, a label-free, domain-agnostic multi-agent self-play framework for self-evolving VLMs through competitive visual games generated from arbitrary image inputs. Specifically, Vision-Zero encompasses three main attributes: (1) Strategic Self-Play Framework: Vision-Zero trains VLMs in "Who Is the Spy"-style games, where the models engage in strategic reasoning and actions across multiple roles. Through interactive gameplay, models autonomously generate their training data without human annotation. (2) Gameplay from Arbitrary Images: Unlike existing gamified frameworks, Vision-Zero can generate games ...

Originally published on March 05, 2026. Curated by AI News.

Llms

Bluesky’s new app is an AI for customizing your feed | The Verge

Eventually Attie will be able to vibe code entire apps for the AT Protocol.

The Verge - AI · 3 min · about 3 hours ago

Llms

Nicolas Carlini (67.2k citations on Google Scholar) says Claude is a better security researcher than him, made $3.7 million from exploiting smart contracts, and found vulnerabilities in Linux and Ghost

Link: https://m.youtube.com/watch?v=1sd26pWhfmg The Linux exploit is especially interesting because it was introduced in 2003 and was nev...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

Llms

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

Inspired by Andrej Karpathy's AutoResearch, I built a system where Claude Code acts as an autonomous ML researcher on tabular binary clas...

Reddit - Machine Learning · 1 min · about 5 hours ago

Llms

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

BraiNN An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning BraiNN is a compact research‑...

Reddit - Machine Learning · 1 min · about 7 hours ago

[2509.25541] Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

About this article

Related Articles

Bluesky’s new app is an AI for customizing your feed | The Verge

Nicolas Carlini (67.2k citations on Google Scholar) says Claude is a better security researcher than him, made $3.7 million from exploiting smart contracts, and found vulnerabilities in Linux and Ghost

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

No comments

Stay updated with AI News