[2507.01761] Enhanced Generative Model Evaluation with Clipped Density and Coverage
Summary
This article presents novel metrics, Clipped Density and Clipped Coverage, aimed at improving the evaluation of generative models by enhancing robustness and interpretability in assessing sample quality.
Why It Matters
Reliable evaluation of generative models is crucial for their application in critical areas. The proposed metrics address existing shortcomings in current evaluation methods, offering a more robust framework that can lead to better model performance and trustworthiness in real-world applications.
Key Takeaways
- Introduces Clipped Density and Clipped Coverage metrics for evaluating generative models.
- Metrics improve robustness against out-of-distribution samples and enhance interpretability.
- Demonstrated effectiveness through extensive experiments on both synthetic and real-world datasets.
- Provides a framework for understanding sample quality in terms of good sample proportions.
- Addresses the need for reliable evaluation in critical applications of generative models.
Computer Science > Machine Learning arXiv:2507.01761 (cs) [Submitted on 2 Jul 2025 (v1), last revised 17 Feb 2026 (this version, v3)] Title:Enhanced Generative Model Evaluation with Clipped Density and Coverage Authors:Nicolas Salvy, Hugues Talbot, Bertrand Thirion View a PDF of the paper titled Enhanced Generative Model Evaluation with Clipped Density and Coverage, by Nicolas Salvy and 1 other authors View PDF Abstract:Although generative models have made remarkable progress in recent years, their use in critical applications has been hindered by an inability to reliably evaluate the quality of their generated samples. Quality refers to at least two complementary concepts: fidelity and coverage. Current quality metrics often lack reliable, interpretable values due to an absence of calibration or insufficient robustness to outliers. To address these shortcomings, we introduce two novel metrics: Clipped Density and Clipped Coverage. By clipping individual sample contributions, as well as the radii of nearest neighbor balls for fidelity, our metrics prevent out-of-distribution samples from biasing the aggregated values. Through analytical and empirical calibration, these metrics demonstrate linear score degradation as the proportion of bad samples increases. Thus, they can be straightforwardly interpreted as equivalent proportions of good samples. Extensive experiments on synthetic and real-world datasets demonstrate that Clipped Density and Clipped Coverage outperform exist...