[2510.23868] GIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO

[2510.23868] GIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNA

arXiv - Machine Learning April 09, 2026 4 min read

About this article

Abstract page for arXiv paper 2510.23868: GIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNA

Computer Science > Machine Learning arXiv:2510.23868 (cs) [Submitted on 27 Oct 2025 (v1), last revised 8 Apr 2026 (this version, v4)] Title:GIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNA Authors:Zhichao Wang View a PDF of the paper titled GIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNA, by Zhichao Wang View PDF HTML (experimental) Abstract:This paper proposes \textit{Group-relative Implicit Fine-Tuning (GIFT)}, a reinforcement learning framework for aligning large language models (LLMs) that unifies on-policy optimization with implicit preference learning. GIFT combines three key elements: (1) group-based sampling and normalization from GRPO, (2) the implicit reward formulation of DPO, and (3) the training principle underlying UNA. The central idea is to transform reward maximization into a \textit{group-wise reward matching problem}. By jointly normalizing implicit and explicit rewards within each sampled group, GIFT eliminates the intractable normalization constant associated with implicit rewards and reduces sensitivity to the KL-regularization coefficient through normalization. This yields a simple mean squared error (MSE) objective between normalized implicit and explicit reward functions, providing a stable and analytically tractable training signal. Unlike offline approaches such as DPO and UNA, GIFT retains on-policy exploration through on-policy response sampling. Compared to GRPO, it replaces high-variance re...

Originally published on April 09, 2026. Curated by AI News.

Llms

We gave 45 psychological questionnaires to 50 LLMs. What we found was not “personality.”

What is the “personality” of an LLM? What actually differentiates models psychometrically? Since LLMs entered public use, researchers hav...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

How to Disable Google's Gemini in Chrome | WIRED

Chrome users were caught off guard by a 4-GB Google AI model baked into Chrome, sparking privacy concerns. The good news: You can easily ...

Wired - AI · 6 min · about 4 hours ago

Llms

OpenAI introduces new 'Trusted Contact' safeguard for cases of possible self-harm | TechCrunch

The company is expanding its efforts to protect ChatGPT users in cases where conversations may turn to self-harm.

TechCrunch - AI · 5 min · about 4 hours ago

Llms

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

Thanks to Musk v. Altman, the public is getting a concrete look at details of Sam Altman’s ouster from OpenAI, much of it centered on for...

The Verge - AI · 11 min · about 6 hours ago

[2510.23868] GIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNA

About this article

Related Articles

We gave 45 psychological questionnaires to 50 LLMs. What we found was not “personality.”

How to Disable Google's Gemini in Chrome | WIRED

OpenAI introduces new 'Trusted Contact' safeguard for cases of possible self-harm | TechCrunch

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

No comments

Stay updated with AI News