[2602.12659] IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models
Summary
The paper introduces IndicFairFace, a balanced dataset aimed at addressing geographical bias in Vision-Language Models (VLMs) by representing India's diverse demographics through 14,400 images.
Why It Matters
As AI systems increasingly influence societal outcomes, addressing biases in training data is crucial. IndicFairFace provides a necessary resource for auditing and mitigating geographical bias, particularly for Indian demographics, enhancing fairness in AI applications.
Key Takeaways
- IndicFairFace comprises 14,400 images reflecting India's geographical diversity.
- The dataset aims to mitigate representational bias in Vision-Language Models.
- Post-hoc debiasing techniques were applied without significantly affecting model accuracy.
- The work highlights the importance of nuanced demographic representation in AI training data.
- IndicFairFace sets a benchmark for future studies on geographical bias in AI.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.12659 (cs) [Submitted on 13 Feb 2026] Title:IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models Authors:Aarish Shah Mohsin, Mohammed Tayyab Ilyas Khan, Mohammad Nadeem, Shahab Saquib Sohail, Erik Cambria, Jiechao Gao View a PDF of the paper titled IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models, by Aarish Shah Mohsin and 4 other authors View PDF Abstract:Vision-Language Models (VLMs) are known to inherit and amplify societal biases from their web-scale training data with Indian being particularly misrepresented. Existing fairness-aware datasets have significantly improved demographic balance across global race and gender groups, yet they continue to treat Indian as a single monolithic category. The oversimplification ignores the vast intra-national diversity across 28 states and 8 Union Territories of India and leads to representational and geographical bias. To address the limitation, we present IndicFairFace, a novel and balanced face dataset comprising 14,400 images representing geographical diversity of India. Images were sourced ethically from Wikimedia Commons and open-license web repositories and uniformly balanced across states and gender. Using IndicFairFace, we quantify intra-national geographical bias in prominent CLIP-based VLMs and reduce it using post-hoc Itera...