[2603.05267] Beyond Word Error Rate: Auditing the Diversity Tax in

[2603.05267] Beyond Word Error Rate: Auditing the Diversity Tax in Speech Recognition through Dataset Cartography

arXiv - Machine Learning March 06, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.05267: Beyond Word Error Rate: Auditing the Diversity Tax in Speech Recognition through Dataset Cartography

Computer Science > Machine Learning arXiv:2603.05267 (cs) [Submitted on 5 Mar 2026] Title:Beyond Word Error Rate: Auditing the Diversity Tax in Speech Recognition through Dataset Cartography Authors:Ting-Hui Cheng, Line H. Clemmensen, Sneha Das View a PDF of the paper titled Beyond Word Error Rate: Auditing the Diversity Tax in Speech Recognition through Dataset Cartography, by Ting-Hui Cheng and 2 other authors View PDF HTML (experimental) Abstract:Automatic speech recognition (ASR) systems are predominantly evaluated using the Word Error Rate (WER). However, raw token-level metrics fail to capture semantic fidelity and routinely obscures the `diversity tax', the disproportionate burden on marginalized and atypical speaker due to systematic recognition failures. In this paper, we explore the limitations of relying solely on lexical counts by systematically evaluating a broader class of non-linear and semantic metrics. To enable rigorous model auditing, we introduce the sample difficulty index (SDI), a novel metric that quantifies how intrinsic demographic and acoustic factors drive model failure. By mapping SDI on data cartography, we demonstrate that metrics EmbER and SemDist expose hidden systemic biases and inter-model disagreements that WER ignores. Finally, our findings are the first steps towards a robust audit framework for prospective safety analysis, empowering developers to audit and mitigate ASR disparities prior to deployment. Comments: Subjects: Machine Learn...

Originally published on March 06, 2026. Curated by AI News.

Llms

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

Inspired by Andrej Karpathy's AutoResearch, I built a system where Claude Code acts as an autonomous ML researcher on tabular binary clas...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

I built an experimental UI and visualization layer around Meta’s open brain-response model just to see whether this stuff actually works ...

Reddit - Machine Learning · 1 min · about 6 hours ago

Machine Learning

[D] Why does it seem like open source materials on ML are incomplete? this is not enough...

Many times when I try to deeply understand a topic in machine learning — whether it's a new architecture, a quantization method, a full t...

Reddit - Machine Learning · 1 min · about 8 hours ago

[2603.05267] Beyond Word Error Rate: Auditing the Diversity Tax in Speech Recognition through Dataset Cartography

About this article

Related Articles

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

[D] Why does it seem like open source materials on ML are incomplete? this is not enough...

No comments

Stay updated with AI News