[2603.19741] FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment
About this article
Abstract page for arXiv paper 2603.19741: FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment
Computer Science > Machine Learning arXiv:2603.19741 (cs) [Submitted on 20 Mar 2026] Title:FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment Authors:Kewen Zhu, Liping Yi, Zhiming Zhao, Zhuang Qi, Han Yu, Qinghua Hu View a PDF of the paper titled FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment, by Kewen Zhu and 5 other authors View PDF HTML (experimental) Abstract:Aligning large language models (LLMs) with human preferences in federated learning (FL) is challenging due to decentralized, privacy-sensitive, and highly non-IID preference data. Direct Preference Optimization (DPO) offers an efficient alternative to reinforcement learning with human feedback (RLHF), but its direct application in FL suffers from severe performance degradation under non-IID data and limited generalization of implicit rewards. To bridge this gap, we propose FedPDPO (Federated Personalized Direct Preference Optimization), a personalized federated framework for preference alignment of LLMs. It adopts a parameter-efficient fine-tuning architecture where each client maintains a frozen pretrained LLM backbone augmented with a Low-Rank Adaptation (LoRA) adapter, enabling communication-efficient aggregation. To address non-IID heterogeneity, we devise (1) the globally shared LoRA adapter with the personalized client-specific LLM head. Moreover, we introduce (2) a personalized DPO training strategy with a clie...