[2603.02475] Large-Scale Dataset and Benchmark for Skin Tone

[2603.02475] Large-Scale Dataset and Benchmark for Skin Tone Classification in the Wild

arXiv - Machine Learning March 04, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.02475: Large-Scale Dataset and Benchmark for Skin Tone Classification in the Wild

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.02475 (cs) [Submitted on 2 Mar 2026] Title:Large-Scale Dataset and Benchmark for Skin Tone Classification in the Wild Authors:Vitor Pereira Matias, Márcus Vinícius Lobo Costa, João Batista Neto, Tiago Novello de Brito View a PDF of the paper titled Large-Scale Dataset and Benchmark for Skin Tone Classification in the Wild, by Vitor Pereira Matias and 3 other authors View PDF HTML (experimental) Abstract:Deep learning models often inherit biases from their training data. While fairness across gender and ethnicity is well-studied, fine-grained skin tone analysis remains a challenge due to the lack of granular, annotated datasets. Existing methods often rely on the medical 6-tone Fitzpatrick scale, which lacks visual representativeness, or use small, private datasets that prevent reproducibility, or often rely on classic computer vision pipelines, with a few using deep learning. They overlook issues like train-test leakage and dataset imbalance, and are limited by small or unavailable datasets. In this work, we present a comprehensive framework for skin tone fairness. First, we introduce the STW, a large-scale, open-access dataset comprising 42,313 images from 3,564 individuals, labeled using the 10-tone MST scale. Second, we benchmark both Classic Computer Vision (SkinToneCCV) and Deep Learning approaches, demonstrating that classic models provide near-random results, while deep learning reaches nearly ann...

Originally published on March 04, 2026. Curated by AI News.

Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min · about 1 hour ago

Ai Startups

Top 10 AI certifications and courses for 2026

This article reviews the top 10 AI certifications and courses for 2026, highlighting their significance in a rapidly evolving field and t...

AI Events · 15 min · about 1 hour ago

Machine Learning

Hub Group Using AI, Machine Learning for Real-Time Visibility of Shipments

Hub Group says it’s using artificial intelligence and machine learning to leverage data from its GPS-equipped container fleet to give cus...

AI Events · 4 min · about 1 hour ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 2 hours ago

[2603.02475] Large-Scale Dataset and Benchmark for Skin Tone Classification in the Wild

About this article

Related Articles

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

Top 10 AI certifications and courses for 2026

Hub Group Using AI, Machine Learning for Real-Time Visibility of Shipments

UMKC Announces New Master of Science in Artificial Intelligence

No comments

Stay updated with AI News