[2511.11159] Adaptive Symmetrization of the KL Divergence
About this article
Abstract page for arXiv paper 2511.11159: Adaptive Symmetrization of the KL Divergence
Computer Science > Machine Learning arXiv:2511.11159 (cs) [Submitted on 14 Nov 2025 (v1), last revised 8 Apr 2026 (this version, v2)] Title:Adaptive Symmetrization of the KL Divergence Authors:Omri Ben-Dov, Luiz F.O. Chamon View a PDF of the paper titled Adaptive Symmetrization of the KL Divergence, by Omri Ben-Dov and 1 other authors View PDF HTML (experimental) Abstract:Many tasks in machine learning can be described as or reduced to learning a probability distribution given a finite set of samples. A common approach is to minimize a statistical divergence between the (empirical) data distribution and a parameterized distribution, e.g., a normalizing flow (NF) or an energy-based model (EBM). In this context, the forward KL divergence is a ubiquitous due to its tractability, though its asymmetry may prevent capturing some properties of the target distribution. Symmetric alternatives involve brittle min-max formulations and adversarial training (e.g., generative adversarial networks) or evaluating the reverse KL divergence, as is the case for the symmetric Jeffreys divergence, which is challenging to compute from samples. This work sets out to develop a new approach to minimize the Jeffreys divergence. To do so, it uses a proxy model whose goal is not only to fit the data, but also to assist in optimizing the Jeffreys divergence of the main model. This joint training task is formulated as a constrained optimization problem to obtain a practical algorithm that adapts the mo...