[2604.04917] Vero: An Open RL Recipe for General Visual Reasoning
About this article
Abstract page for arXiv paper 2604.04917: Vero: An Open RL Recipe for General Visual Reasoning
Computer Science > Computer Vision and Pattern Recognition arXiv:2604.04917 (cs) [Submitted on 6 Apr 2026] Title:Vero: An Open RL Recipe for General Visual Reasoning Authors:Gabriel Sarch, Linrong Cai, Qunzhong Wang, Haoyang Wu, Danqi Chen, Zhuang Liu View a PDF of the paper titled Vero: An Open RL Recipe for General Visual Reasoning, by Gabriel Sarch and 5 other authors View PDF HTML (experimental) Abstract:What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) show such broad visual reasoning is within reach, but the recipe behind them remains unclear, locked behind proprietary reinforcement learning (RL) pipelines with non-public data. We introduce Vero, a family of fully open VLMs that matches or exceeds existing open-weight models across diverse visual reasoning tasks. We scale RL data and rewards across six broad task categories, constructing Vero-600K, a 600K-sample dataset from 59 datasets, and designing task-routed rewards that handle heterogeneous answer formats. Vero achieves state-of-the-art performance, improving over four base models by 3.7-5.5 points on average across VeroEval, our suite of 30 challenging benchmarks. Starting from Qwen3-VL-8B-Instruct, Vero outperforms Qwen3-VL-8B-Thinking on 23 of 30 benchmarks without additional proprietary thinking data. When trained from the same base model, Vero-600K exceeds existing RL datasets across task ...