[2604.01840] Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models
About this article
Abstract page for arXiv paper 2604.01840: Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models
Computer Science > Artificial Intelligence arXiv:2604.01840 (cs) [Submitted on 2 Apr 2026 (v1), last revised 8 Apr 2026 (this version, v2)] Title:Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models Authors:Zekai Ye, Qiming Li, Xiaocheng Feng, Ruihan Chen, Ziming Li, Haoyu Ren, Kun Chen, Dandan Tu, Bing Qin View a PDF of the paper titled Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models, by Zekai Ye and 8 other authors View PDF HTML (experimental) Abstract:While Reinforcement Learning from Verifiable Rewards (RLVR) has advanced reasoning in Large Vision-Language Models (LVLMs), prevailing frameworks suffer from a foundational methodological flaw: by distributing identical advantages across all generated tokens, these methods inherently dilute the learning signals essential for optimizing the critical, visually-grounded steps of multimodal reasoning. To bridge this gap, we formulate \textit{Token Visual Dependency}, quantifying the causal information gain of visual inputs via the Kullback-Leibler (KL) divergence between visual-conditioned and text-only predictive distributions. Revealing that this dependency is highly sparse and semantically pivotal, we introduce Perception-Grounded Policy Optimization (PGPO), which is a novel fine-grained credit assignment framework that dynamically reshapes advantages at the token level. Through a threshold-gated, mass-conserving mechanism, ...