[2507.17316] Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability
Summary
This paper presents a comprehensive study on estimating discrete distributions using Kullback-Leibler divergence, establishing minimax rates and exploring the challenges in achieving tight bounds.
Why It Matters
Understanding minimax estimation in Kullback-Leibler divergence is crucial for statisticians and machine learning practitioners, as it impacts how we assess and improve models' predictive accuracy. This research highlights the complexities involved in achieving optimal estimation rates, which can inform future methodologies in statistical learning.
Key Takeaways
- Estimates the minimax rate for discrete distribution estimation in Kullback-Leibler divergence.
- Identifies the challenges in achieving tight lower bounds compared to other divergence measures.
- Introduces a novel estimator through online-to-batch conversion techniques.
- Discusses the implications of outcome probabilities on minimax estimation difficulty.
- Highlights the gap in minimax Kullback-Leibler estimation compared to asymptotic estimation.
Statistics > Machine Learning arXiv:2507.17316 (stat) [Submitted on 23 Jul 2025 (v1), last revised 20 Feb 2026 (this version, v3)] Title:Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability Authors:Dirk van der Hoeven, Julia Olkhovskaia, Tim van Erven View a PDF of the paper titled Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability, by Dirk van der Hoeven and 2 other authors View PDF Abstract:We consider the fundamental problem of estimating a discrete distribution on a domain of size $K$ with high probability in Kullback-Leibler divergence. We provide upper and lower bounds on the minimax estimation rate, which show that the optimal rate is between $\big(K + \ln(K)\ln(1/\delta)\big) /n$ and $\big(K\ln\ln(K) + \ln(K)\ln(1/\delta)\big) /n$ at error probability $\delta$ and sample size $n$, which pins down the rate up to the doubly logarithmic factor $\ln \ln K$ that multiplies $K$. Our upper bound uses techniques from online learning to construct a novel estimator via online-to-batch conversion. Perhaps surprisingly, the tail behavior of the minimax rate is worse than for the squared total variation and squared Hellinger distance, for which it is $\big(K + \ln(1/\delta)\big) /n$, i.e. without the $\ln K$ multiplying $\ln (1/\delta)$. As a consequence, we cannot obtain a fully tight lower bound from the usual reduction to these smaller distances. Moreover, we show that this lowe...