[2603.23086] Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards
About this article
Abstract page for arXiv paper 2603.23086: Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards
Computer Science > Machine Learning arXiv:2603.23086 (cs) [Submitted on 24 Mar 2026] Title:Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards Authors:Orhun Buğra Baran, Melih Kandemir, Ramazan Gokberk Cinbis View a PDF of the paper titled Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards, by Orhun Bu\u{g}ra Baran and 1 other authors View PDF Abstract:Autoregressive (AR) models are highly effective for image generation, yet their standard maximum-likelihood estimation training lacks direct optimization for sample quality and diversity. While reinforcement learning (RL) has been used to align diffusion models, these methods typically suffer from output diversity collapse. Similarly, concurrent RL methods for AR models rely strictly on instance-level rewards, often trading off distributional coverage for quality. To address these limitations, we propose a lightweight RL framework that casts token-based AR synthesis as a Markov Decision Process, optimized via Group Relative Policy Optimization (GRPO). Our core contribution is the introduction of a novel distribution-level Leave-One-Out FID (LOO-FID) reward; by leveraging an exponential moving average of feature moments, it explicitly encourages sample diversity and prevents mode collapse during policy updates. We integrate this with composite instance-level rewards (CLIP and HPSv2) for strict semantic and perceptual fidelity, and stabi...