[2602.22601] $ϕ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models

[2602.22601] $ϕ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models

arXiv - Machine Learning 4 min read Article

Summary

The paper presents the $ϕ$-DPO framework, addressing fairness in continual learning for large multimodal models by optimizing preference signals to mitigate bias and forgetting.

Why It Matters

As AI models increasingly handle diverse data, ensuring fairness in their learning processes is crucial. This research tackles the underexplored issue of data imbalance, which can lead to biased outcomes, thereby contributing to more equitable AI systems.

Key Takeaways

  • Introduces the $ϕ$-DPO framework to enhance fairness in continual learning.
  • Addresses catastrophic forgetting and data imbalance in large multimodal models.
  • Demonstrates state-of-the-art performance through extensive experiments.

Computer Science > Machine Learning arXiv:2602.22601 (cs) [Submitted on 26 Feb 2026] Title:$ϕ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models Authors:Thanh-Dat Truong, Huu-Thien Tran, Jackson Cothren, Bhiksha Raj, Khoa Luu View a PDF of the paper titled $\phi$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models, by Thanh-Dat Truong and 4 other authors View PDF HTML (experimental) Abstract:Fairness in Continual Learning for Large Multimodal Models (LMMs) is an emerging yet underexplored challenge, particularly in the presence of imbalanced data distributions that can lead to biased model updates and suboptimal performance across tasks. While recent continual learning studies have made progress in addressing catastrophic forgetting, the problem of fairness caused the imbalanced data remains largely underexplored. This paper presents a novel Fairness Direct Preference Optimization (FaiDPO or $\phi$-DPO) framework for continual learning in LMMs. In particular, we first propose a new continual learning paradigm based on Direct Preference Optimization (DPO) to mitigate catastrophic forgetting by aligning learning with pairwise preference signals. Then, we identify the limitations of conventional DPO in imbalanced data and present a new $\phi$-DPO loss that explicitly addresses distributional biases. We provide a comprehensive theoretical analysis demonstrating that our appr...

Related Articles

AI Has Flooded All the Weather Apps | WIRED
Machine Learning

AI Has Flooded All the Weather Apps | WIRED

Weather forecasting has gotten a big boost from machine learning. How that translates into what users see can vary.

Wired - AI · 8 min ·
Llms

What I learned about multi-agent coordination running 9 specialized Claude agents

I've been experimenting with multi-agent AI systems and ended up building something more ambitious than I originally planned: a fully ope...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

The AI Chip War is Just Getting Started

Everyone talks about AI models, but the real bottleneck might be hardware. According to a recent study by Roots Analysis: AI chip market ...

Reddit - Artificial Intelligence · 1 min ·
Exclusive: Runway launches $10M fund, Builders program to support early stage AI startups | TechCrunch
Machine Learning

Exclusive: Runway launches $10M fund, Builders program to support early stage AI startups | TechCrunch

Runway is launching a $10 million fund and startup program to back companies building with its AI video models, as it pushes toward inter...

TechCrunch - AI · 7 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime