[2603.23086] Policy-based Tuning of Autoregressive Image Models with

[2603.23086] Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards

arXiv - Machine Learning March 25, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.23086: Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards

Computer Science > Machine Learning arXiv:2603.23086 (cs) [Submitted on 24 Mar 2026] Title:Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards Authors:Orhun Buğra Baran, Melih Kandemir, Ramazan Gokberk Cinbis View a PDF of the paper titled Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards, by Orhun Bu\u{g}ra Baran and 1 other authors View PDF Abstract:Autoregressive (AR) models are highly effective for image generation, yet their standard maximum-likelihood estimation training lacks direct optimization for sample quality and diversity. While reinforcement learning (RL) has been used to align diffusion models, these methods typically suffer from output diversity collapse. Similarly, concurrent RL methods for AR models rely strictly on instance-level rewards, often trading off distributional coverage for quality. To address these limitations, we propose a lightweight RL framework that casts token-based AR synthesis as a Markov Decision Process, optimized via Group Relative Policy Optimization (GRPO). Our core contribution is the introduction of a novel distribution-level Leave-One-Out FID (LOO-FID) reward; by leveraging an exponential moving average of feature moments, it explicitly encourages sample diversity and prevents mode collapse during policy updates. We integrate this with composite instance-level rewards (CLIP and HPSv2) for strict semantic and perceptual fidelity, and stabi...

Originally published on March 25, 2026. Curated by AI News.

Machine Learning

I have question for people who got job

how you guys getting job in ml as a fresher ?? I am in college. havent started learning ml but willing to . let me know exactly how to do...

Reddit - ML Jobs · 1 min · about 1 hour ago

Llms

🤖 AI News Digest - March 27, 2026

Today's AI news: 1. My minute-by-minute response to the LiteLLM malware attack The article describes a detailed, minute-by-minute respons...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

I have a problem statement where we are supposed to detect the attention level of student in a classroom, basically output whether he is ...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

I'm looking to work with people interested in math, machine learning, or agentic coding, on creating a multi-agent framework to do fronti...

Reddit - Machine Learning · 1 min · about 3 hours ago

[2603.23086] Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards

About this article

Related Articles

I have question for people who got job

🤖 AI News Digest - March 27, 2026

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

No comments

Stay updated with AI News