[2511.20629] MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models
Summary
The paper presents MapReduce LoRA, a novel framework for optimizing generative models by addressing multi-preference alignment issues. It introduces two methods that enhance model performance across various tasks, demonstrating significant improvements in generative quality me...
Why It Matters
This research is significant as it tackles the challenge of aligning generative models with multiple human preferences, a crucial aspect for applications in AI-driven content creation. By advancing the state-of-the-art in multi-preference optimization, it enhances the usability and effectiveness of generative models across various domains.
Key Takeaways
- MapReduce LoRA introduces a new approach to optimize generative models for multiple preferences.
- The framework shows substantial improvements in generative tasks, including text-to-image and text-to-video generation.
- It employs parallel training of preference-specific experts to refine a shared model effectively.
- Reward-aware Token Embedding (RaTE) enhances flexibility in preference control during inference.
- The study sets a new benchmark for multi-preference alignment in generative models.
Computer Science > Computer Vision and Pattern Recognition arXiv:2511.20629 (cs) [Submitted on 25 Nov 2025 (v1), last revised 23 Feb 2026 (this version, v4)] Title:MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models Authors:Chieh-Yun Chen, Zhonghao Wang, Qi Chen, Zhifan Ye, Min Shi, Yue Zhao, Yinan Zhao, Hui Qu, Wei-An Lin, Yiru Shen, Ajinkya Kale, Irfan Essa, Humphrey Shi View a PDF of the paper titled MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models, by Chieh-Yun Chen and 12 other authors View PDF HTML (experimental) Abstract:Reinforcement learning from human feedback (RLHF) with reward models has advanced alignment of generative models to human aesthetic and perceptual preferences. However, jointly optimizing multiple rewards often incurs an alignment tax, improving one dimension while degrading others. To address this, we introduce two complementary methods: MapReduce LoRA and Reward-aware Token Embedding (RaTE). MapReduce LoRA trains preference-specific LoRA experts in parallel and iteratively merges them to refine a shared base model; RaTE learns reward-specific token embeddings that compose at inference for flexible preference control. Experiments on Text-to-Image generation (Stable Diffusion 3.5 Medium and FLUX.1-dev) show improvements of 36.1%, 4.6%, and 55.7%, and 32.7%, 4.3%, and 67.1% on GenEval, PickScore, and OCR, respectively. On Text-to-Video generation (HunyuanVi...