[2604.04410] Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment
About this article
Abstract page for arXiv paper 2604.04410: Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment
Computer Science > Machine Learning arXiv:2604.04410 (cs) [Submitted on 6 Apr 2026] Title:Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment Authors:Hiroshi Takahashi, Tomoharu Iwata, Atsutoshi Kumagai, Sekitoshi Kanai, Masanori Yamada, Kosuke Nishida, Kazutoshi Shinoda View a PDF of the paper titled Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment, by Hiroshi Takahashi and 6 other authors View PDF HTML (experimental) Abstract:Aligning language models with human preferences is essential for ensuring their safety and reliability. Although most existing approaches assume specific human preference models such as the Bradley-Terry model, this assumption may fail to accurately capture true human preferences, and consequently, these methods lack statistical consistency, i.e., the guarantee that language models converge to the true human preference as the number of samples increases. In contrast, direct density ratio optimization (DDRO) achieves statistical consistency without assuming any human preference models. DDRO models the density ratio between preferred and non-preferred data distributions using the language model, and then optimizes it via density ratio estimation. However, this density ratio is unstable and often diverges, leading to training instability of DDRO. In this paper, we propose a novel alignment method that is both stable and statistically consistent. Our approach is based on ...