Vision Language Model Alignment in TRL ⚡️

Vision Language Model Alignment in TRL ⚡️

Hugging Face Blog 14 min read

About this article

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Back to Articles Vision Language Model Alignment in TRL ⚡️ Published August 7, 2025 Update on GitHub Upvote 109 +103 Sergio Paniego sergiopaniego Follow merve merve Follow Quentin Gallouédec qgallouedec Follow Kashif Rasul kashif Follow Aritra Roy Gosthipaty ariG23498 Follow Introduction Vision Language Models (VLMs) are getting stronger, but aligning them to human preferences still matters. In TRL, we already showed how to post-train VLMs with Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). This time, we’re going further. tl;dr Here’s what’s new in TRL: Mixed Preference Optimization (MPO) Group Relative Policy Optimization (GRPO) Group Sequence Policy Optimization (GSPO) (a variant of GRPO) These go beyond pairwise DPO, extracting richer signals from preference data and scaling better with modern VLMs. We’ve also extended existing methods to support VLMs: Reinforce Leave One Out (RLOO) Online Direct Preference Optimization (Online DPO) This enables more efficient and scalable multimodal alignment. Finally: Native Supervised Fine-tuning support for Vision Language Models Training scripts and demo notebooks to help you get started quickly Table of Contents Multimodal Alignment for VLMs in TRL ⚡️ Introduction Alignment for Vision Language Models Mixed Preference Optimization (MPO) Multimodal Group Relative Policy Optimization (GRPO) Group Sequence Policy Optimization (GSPO) Comparison Further Extensions for VLMs Reinforce Leave One Out (RLOO) Online Di...

Originally published on February 15, 2026. Curated by AI News.

Related Articles

Llms

What I learned about multi-agent coordination running 9 specialized Claude agents

I've been experimenting with multi-agent AI systems and ended up building something more ambitious than I originally planned: a fully ope...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] The problem with comparing AI memory system benchmarks — different evaluation methods make scores meaningless

I've been reviewing how various AI memory systems evaluate their performance and noticed a fundamental issue with cross-system comparison...

Reddit - Machine Learning · 1 min ·
Shifting to AI model customization is an architectural imperative | MIT Technology Review
Llms

Shifting to AI model customization is an architectural imperative | MIT Technology Review

In the early days of large language models (LLMs), we grew accustomed to massive 10x jumps in reasoning and coding capability with every ...

MIT Technology Review · 6 min ·
Llms

Artificial intelligence will always depends on human otherwise it will be obsolete.

I was looking for a tool for my specific need. There was not any. So i started to write the program in python, just basic structure. Then...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime