[2602.14682] Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error

[2602.14682] Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error

arXiv - AI 4 min read Article

Summary

This paper investigates the diversity bias in deep generative models, revealing that these models often underestimate the diversity of the underlying data distribution and proposing methods to correct this bias.

Why It Matters

Understanding and correcting diversity bias in generative models is crucial for improving their performance and reliability in various applications. This research highlights the limitations of current models and offers strategies for enhancing their ability to capture data diversity, which is essential for fair and effective AI systems.

Key Takeaways

  • Deep generative models often exhibit a systematic downward diversity bias.
  • Diversity scores from generated samples are consistently lower than those from actual data.
  • Finite sample sizes can lead to underestimating true data diversity.
  • Optimizing for empirical data distributions may reduce diversity.
  • Diversity-aware regularization strategies can mitigate bias effectively.

Computer Science > Machine Learning arXiv:2602.14682 (cs) [Submitted on 16 Feb 2026] Title:Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error Authors:Farzan Farnia, Mohammad Jalali, Azim Ospanov View a PDF of the paper titled Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error, by Farzan Farnia and 2 other authors View PDF HTML (experimental) Abstract:Deep generative models have achieved great success in producing high-quality samples, making them a central tool across machine learning applications. Beyond sample quality, an important yet less systematically studied question is whether trained generative models faithfully capture the diversity of the underlying data distribution. In this work, we address this question by directly comparing the diversity of samples generated by state-of-the-art models with that of test samples drawn from the target data distribution, using recently proposed reference-free entropy-based diversity scores, Vendi and RKE. Across multiple benchmark datasets, we find that test data consistently attains substantially higher Vendi and RKE diversity scores than the generated samples, suggesting a systematic downward diversity bias in modern generative models. To understand the origin of this bias, we analyze the finite-sample behavior of entropy-based diversity scores and show that their expected values increase with sample size, implying that ...

Related Articles

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED
Llms

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED

The AI lab's Project Glasswing will bring together Apple, Google, and more than 45 other organizations. They'll use the new Claude Mythos...

Wired - AI · 7 min ·
Machine Learning

[for hire] Open for contracts – Veteran Data Scientist (AI / ML / OR) focused on delivering real‑world solutions.

Hi Reddit, I've spent 20 years working with data, and I've learned how to crack problems that AI systems struggle with. I've got a knack ...

Reddit - ML Jobs · 1 min ·
Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] ICML final justification

Do we get notified if any reviewer put their final justification into their original review comment? submitted by /u/tuejan11 [link] [com...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime