[2602.02734] WAXAL: A Large-Scale Multilingual African Language Speech Corpus
About this article
Abstract page for arXiv paper 2602.02734: WAXAL: A Large-Scale Multilingual African Language Speech Corpus
Electrical Engineering and Systems Science > Audio and Speech Processing arXiv:2602.02734 (eess) [Submitted on 2 Feb 2026 (v1), last revised 2 Mar 2026 (this version, v3)] Title:WAXAL: A Large-Scale Multilingual African Language Speech Corpus Authors:Abdoulaye Diack, Perry Nelson, Kwaku Agbesi, Angela Nakalembe, MohamedElfatih MohamedKhair, Vusumuzi Dube, Tavonga Siyavora, Subhashini Venugopalan, Jason Hickey, Uche Okonkwo, Abhishek Bapna, Isaac Wiafe, Raynard Dodzi Helegah, Elikem Doe Atsakpo, Charles Nutrokpor, Fiifi Baffoe Payin Winful, Kafui Kwashie Solaga, Jamal-Deen Abdulai, Akon Obu Ekpezu, Audace Niyonkuru, Samuel Rutunda, Boris Ishimwe, Michael Melese, Engineer Bainomugisha, Joyce Nakatumba-Nabende, Andrew Katumba, Claire Babirye, Jonathan Mukiibi, Vincent Kimani, Samuel Kibacia, James Maina, Fridah Emmah, Ahmed Ibrahim Shekarau, Ibrahim Shehu Adamu, Yusuf Abdullahi, Howard Lakougna, Bob MacDonald, Hadar Shemtov, Aisha Walcott-Bryant, Moustapha Cisse, Avinatan Hassidim, Jeff Dean, Yossi Matias View a PDF of the paper titled WAXAL: A Large-Scale Multilingual African Language Speech Corpus, by Abdoulaye Diack and 42 other authors View PDF HTML (experimental) Abstract:The advancement of speech technology has predominantly favored high-resource languages, creating a significant digital divide for speakers of most Sub-Saharan African languages. To address this gap, we introduce WAXAL, a large-scale, openly accessible speech dataset for 24 languages representing over 10...