[2602.21741] Robust Long-Form Bangla Speech Processing: Automatic Speech Recognition and Speaker Diarization

[2602.21741] Robust Long-Form Bangla Speech Processing: Automatic Speech Recognition and Speaker Diarization

arXiv - Machine Learning 3 min read Article

Summary

This article presents an end-to-end system for Bangla long-form speech recognition and speaker diarization, detailing significant challenges and innovative solutions in the field.

Why It Matters

The research addresses the complexities of processing Bangla speech, a language with unique phonetic and dialectal characteristics. By improving automatic speech recognition (ASR) and speaker diarization for low-resource languages, this work contributes to the advancement of inclusive AI technologies and enhances accessibility for Bengali speakers.

Key Takeaways

  • Achieved a Word Error Rate (WER) of 0.36137 for Bangla ASR.
  • Implemented effective vocal source separation and silence-aware chunking.
  • Fine-tuning domain-specific models significantly improved performance.
  • Addressed challenges of dialectal variation and code-mixing in Bangla.
  • Demonstrated the importance of large-scale labeled corpora for ASR tasks.

Computer Science > Computation and Language arXiv:2602.21741 (cs) [Submitted on 25 Feb 2026] Title:Robust Long-Form Bangla Speech Processing: Automatic Speech Recognition and Speaker Diarization Authors:MD. Sagor Chowdhury, Adiba Fairooz Chowdhury View a PDF of the paper titled Robust Long-Form Bangla Speech Processing: Automatic Speech Recognition and Speaker Diarization, by MD. Sagor Chowdhury and 1 other authors View PDF HTML (experimental) Abstract:We describe our end-to-end system for Bengali long-form speech recognition (ASR) and speaker diarization submitted to the DL Sprint 4.0 competition on Kaggle. Bengali presents substantial challenges for both tasks: a large phoneme inventory, significant dialectal variation, frequent code-mixing with English, and a relative scarcity of large-scale labelled corpora. For ASR we achieve a best private Word Error Rate (WER) of 0.37738 and public WER of 0.36137, combining a BengaliAI fine-tuned Whisper medium model with Demucs source separation for vocal isolation, silence-boundary chunking, and carefully tuned generation hyperparameters. For speaker diarization we reach a best private Diarization Error Rate (DER) of 0.27671 and public DER of 0.20936 by replacing the default segmentation model inside the this http URL pipeline with a Bengali-fine-tuned variant, pairing it with wespeaker-voxceleb-resnet34-LM embeddings and centroid-based agglomerative clustering. Our experiments demonstrate that domain-specific fine-tuning of the s...

Related Articles

Llms

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Last week, a team from Stanford and UCSF (Asadi, O'Sullivan, Fei-Fei Li, Euan Ashley et al.) dropped two companion papers. The first, MAR...

Reddit - Artificial Intelligence · 1 min ·
Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch
Machine Learning

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch

Less than a year after launching, with checks from some of the biggest names in Silicon Valley, crowdsourced AI model feedback startup Yu...

TechCrunch - AI · 4 min ·
Machine Learning

[R] Fine-tuning services report

If you have some data and want to train or run a small custom model but don't have powerful enough hardware for training, fine-tuning ser...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Does ML have a "bible"/reference textbook at the Intermediate/Advanced level?

Hello, everyone! This is my first time posting here and I apologise if the question is, perhaps, a bit too basic for this sub-reddit. A b...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime