[2602.22601] $ϕ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models
Summary
The paper presents the $ϕ$-DPO framework, addressing fairness in continual learning for large multimodal models by optimizing preference signals to mitigate bias and forgetting.
Why It Matters
As AI models increasingly handle diverse data, ensuring fairness in their learning processes is crucial. This research tackles the underexplored issue of data imbalance, which can lead to biased outcomes, thereby contributing to more equitable AI systems.
Key Takeaways
- Introduces the $ϕ$-DPO framework to enhance fairness in continual learning.
- Addresses catastrophic forgetting and data imbalance in large multimodal models.
- Demonstrates state-of-the-art performance through extensive experiments.
Computer Science > Machine Learning arXiv:2602.22601 (cs) [Submitted on 26 Feb 2026] Title:$ϕ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models Authors:Thanh-Dat Truong, Huu-Thien Tran, Jackson Cothren, Bhiksha Raj, Khoa Luu View a PDF of the paper titled $\phi$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models, by Thanh-Dat Truong and 4 other authors View PDF HTML (experimental) Abstract:Fairness in Continual Learning for Large Multimodal Models (LMMs) is an emerging yet underexplored challenge, particularly in the presence of imbalanced data distributions that can lead to biased model updates and suboptimal performance across tasks. While recent continual learning studies have made progress in addressing catastrophic forgetting, the problem of fairness caused the imbalanced data remains largely underexplored. This paper presents a novel Fairness Direct Preference Optimization (FaiDPO or $\phi$-DPO) framework for continual learning in LMMs. In particular, we first propose a new continual learning paradigm based on Direct Preference Optimization (DPO) to mitigate catastrophic forgetting by aligning learning with pairwise preference signals. Then, we identify the limitations of conventional DPO in imbalanced data and present a new $\phi$-DPO loss that explicitly addresses distributional biases. We provide a comprehensive theoretical analysis demonstrating that our appr...