[2602.13055] Curriculum-DPO++: Direct Preference Optimization via Data and Model Curricula for Text-to-Image Generation
Summary
The paper presents Curriculum-DPO++, an advanced method for text-to-image generation that optimizes preference learning through a dual curriculum approach, enhancing model training efficiency and performance.
Why It Matters
This research addresses limitations in existing preference optimization methods by introducing a structured learning approach that improves the quality of generated images. By dynamically adjusting learning capacities, it offers a more effective way to train models, which is crucial for advancements in generative AI and computer vision.
Key Takeaways
- Curriculum-DPO++ enhances preference optimization in text-to-image generation.
- The method combines data-level and model-level curricula for improved training.
- Dynamic capacity adjustment of the model leads to better performance on benchmarks.
- Outperforms existing methods in text alignment, aesthetics, and human preference.
- Code for the implementation is publicly available for further research.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.13055 (cs) [Submitted on 13 Feb 2026] Title:Curriculum-DPO++: Direct Preference Optimization via Data and Model Curricula for Text-to-Image Generation Authors:Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe, Mubarak Shah View a PDF of the paper titled Curriculum-DPO++: Direct Preference Optimization via Data and Model Curricula for Text-to-Image Generation, by Florinel-Alin Croitoru and 4 other authors View PDF HTML (experimental) Abstract:Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). However, neither RLHF nor DPO take into account the fact that learning certain preferences is more difficult than learning other preferences, rendering the optimization process suboptimal. To address this gap in text-to-image generation, we recently proposed Curriculum-DPO, a method that organizes image pairs by difficulty. In this paper, we introduce Curriculum-DPO++, an enhanced method that combines the original data-level curriculum with a novel model-level curriculum. More precisely, we propose to dynamically increase the learning capacity of the denoising network as training advances. We implement this capacity increase via two mechanisms. First, we initialize the model with only a subset of the trainable layers used in the original Curriculum-DPO. As training progresses, we sequentially unfreeze ...