[2602.23070] Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment

[2602.23070] Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment

arXiv - AI 4 min read Article

Summary

This paper presents a novel approach to long-form Bengali Automatic Speech Recognition (ASR) and speaker diarization, introducing a comprehensive dataset and innovative techniques for improved accuracy.

Why It Matters

As Bengali ASR and speaker diarization are under-researched, this study addresses critical gaps by providing a large dataset and demonstrating effective methods for processing long-duration audio. This work could significantly enhance speech technology for Bengali speakers and contribute to advancements in low-resource language processing.

Key Takeaways

  • Introduces Lipi-Ghor-882, an 882-hour dataset for Bengali ASR.
  • Highlights the ineffectiveness of raw data scaling for ASR improvement.
  • Demonstrates that targeted fine-tuning with aligned annotations is crucial.
  • Finds that heuristic post-processing is more effective than model retraining for diarization.
  • Establishes a benchmark for low-resource, long-form speech processing with a Real-Time Factor of ~0.019.

Computer Science > Sound arXiv:2602.23070 (cs) [Submitted on 26 Feb 2026] Title:Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment Authors:Sanjid Hasan, Risalat Labib, A H M Fuad, Bayazid Hasan View a PDF of the paper titled Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment, by Sanjid Hasan and 3 other authors View PDF HTML (experimental) Abstract:Although Automatic Speech Recognition (ASR) in Bengali has seen significant progress, processing long-duration audio and performing robust speaker diarization remain critical research gaps. To address the severe scarcity of joint ASR and diarization resources for this language, we introduce Lipi-Ghor-882, a comprehensive 882-hour multi-speaker Bengali dataset. In this paper, detailing our submission to the DL Sprint 4.0 competition, we systematically evaluate various architectures and approaches for long-form Bengali speech. For ASR, we demonstrate that raw data scaling is ineffective; instead, targeted fine-tuning utilizing perfectly aligned annotations paired with synthetic acoustic degradation (noise and reverberation) emerges as the singular most effective approach. Conversely, for speaker diarization, we observed that global open-source state-of-the-art models (such as Diarizen) performed surprisingly poorly on this complex dataset. Extensive model retraining yielded n...

Related Articles

When Agentic AI Browsers Outrun Governance
Ai Safety

When Agentic AI Browsers Outrun Governance

Agentic AI browsers introduce new enterprise risk. Learn how AI governance helps leaders assess exposure, oversight gaps, and safe adopti...

AI Tools & Products · 14 min ·
Llms

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Greetings all - I've posted mostly in r/claudecode and r/aigamedev a couple of times previously. Working with CC for personal projects re...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

The state of AI safety in four fake graphs

submitted by /u/tekz [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization
Machine Learning

[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

Abstract page for arXiv paper 2603.14267: DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and ...

arXiv - AI · 4 min ·
More in Ai Safety: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime