[2504.02293] Breaking the Silence: A Dataset and Benchmark for Bangla

[2504.02293] Breaking the Silence: A Dataset and Benchmark for Bangla Text-to-Gloss Translation

arXiv - AI March 24, 2026 4 min read

About this article

Abstract page for arXiv paper 2504.02293: Breaking the Silence: A Dataset and Benchmark for Bangla Text-to-Gloss Translation

Computer Science > Computation and Language arXiv:2504.02293 (cs) [Submitted on 3 Apr 2025 (v1), last revised 22 Mar 2026 (this version, v2)] Title:Breaking the Silence: A Dataset and Benchmark for Bangla Text-to-Gloss Translation Authors:Sharif Mohammad Abdullah, Abhijit Paul, Shubhashis Roy Dipta, Zarif Masud, Shebuti Rayana, Ahmedul Kabir View a PDF of the paper titled Breaking the Silence: A Dataset and Benchmark for Bangla Text-to-Gloss Translation, by Sharif Mohammad Abdullah and 5 other authors View PDF HTML (experimental) Abstract:Gloss is a written approximation that bridges Sign Language (SL) and its corresponding spoken language. Despite a deaf and hard-of-hearing population of at least 3 million in Bangladesh, Bangla Sign Language (BdSL) remains largely understudied, with no prior work on Bangla text-to-gloss translation and no publicly available datasets. To address this gap, we construct the first Bangla text-to-gloss dataset, consisting of 1,000 manually annotated and 4,000 synthetically generated Bangla sentence-gloss pairs, along with 159 expert human-annotated pairs used as a test set. Our experimental framework performs a comparative analysis between several fine-tuned open-source models and a leading closed-source LLM to evaluate their performance in low-resource BdSL translation. GPT-5.4 achieves the best overall performance, while a fine-tuned mBART model performs competitively despite being approximately 100% smaller. Qwen-3 outperforms all other mod...

Originally published on March 24, 2026. Curated by AI News.

Machine Learning

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

I built an experimental UI and visualization layer around Meta’s open brain-response model just to see whether this stuff actually works ...

Reddit - Machine Learning · 1 min · 11 minutes ago

Machine Learning

[D] Why does it seem like open source materials on ML are incomplete? this is not enough...

Many times when I try to deeply understand a topic in machine learning — whether it's a new architecture, a quantization method, a full t...

Reddit - Machine Learning · 1 min · about 1 hour ago

Ai Startups

Top 10 AI certifications and courses for 2026

This article reviews the top 10 AI certifications and courses for 2026, highlighting their significance in a rapidly evolving field and t...

AI Events · 15 min · about 6 hours ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 7 hours ago

[2504.02293] Breaking the Silence: A Dataset and Benchmark for Bangla Text-to-Gloss Translation

About this article

Related Articles

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

[D] Why does it seem like open source materials on ML are incomplete? this is not enough...

Top 10 AI certifications and courses for 2026

UMKC Announces New Master of Science in Artificial Intelligence

No comments

Stay updated with AI News