[2503.12575] BalancedDPO: Adaptive Multi-Metric Alignment
About this article
Abstract page for arXiv paper 2503.12575: BalancedDPO: Adaptive Multi-Metric Alignment
Computer Science > Computer Vision and Pattern Recognition arXiv:2503.12575 (cs) [Submitted on 16 Mar 2025 (v1), last revised 5 Apr 2026 (this version, v2)] Title:BalancedDPO: Adaptive Multi-Metric Alignment Authors:Dipesh Tamboli, Souradip Chakraborty, Aditya Malusare, Biplab Banerjee, Amrit Singh Bedi, Vaneet Aggarwal View a PDF of the paper titled BalancedDPO: Adaptive Multi-Metric Alignment, by Dipesh Tamboli and 5 other authors View PDF HTML (experimental) Abstract:Diffusion models have achieved remarkable progress in text-to-image generation, yet aligning them with human preference remains challenging due to the presence of multiple, sometimes conflicting, evaluation metrics (e.g., semantic consistency, aesthetics, and human preference scores). Existing alignment methods typically optimize for a single metric or rely on scalarized reward aggregation, which can bias the model toward specific evaluation criteria. To address this challenge, we propose BalancedDPO, a framework that achieves multi-metric preference alignment within the Direct Preference Optimization (DPO) paradigm. Unlike prior DPO variants that rely on a single metric, BalancedDPO introduces a majority-vote consensus over multiple preference scorers and integrates it directly into the DPO training loop with dynamic reference model updates. This consensus-based formulation avoids reward-scale conflicts and ensures more stable gradient directions across heterogeneous metrics. Experiments on Pick-a-Pic, Par...