Machine Learning Nlp Ai Agents

[2602.22039] TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition

arXiv - AI February 26, 2026 4 min read Article

Summary

The paper presents TG-ASR, a translation-guided framework for improving automatic speech recognition in low-resource languages, specifically Taiwanese Hokkien, using parallel gated cross-attention mechanisms.

Why It Matters

This research addresses the critical challenge of low-resource automatic speech recognition, which affects many languages due to a lack of transcribed data. By utilizing translation-guided learning, the study enhances ASR performance, making it applicable to underrepresented languages and potentially improving accessibility and technology adoption in these regions.

Key Takeaways

TG-ASR leverages multilingual translation embeddings to enhance ASR for low-resource languages.
The parallel gated cross-attention mechanism minimizes interference between languages while optimizing performance.
A new corpus, YT-THDC, was introduced to support research in Taiwanese Hokkien ASR.
The framework achieved a 14.77% relative reduction in character error rate, demonstrating its effectiveness.
This approach can be a model for improving ASR systems for other underrepresented languages.

Electrical Engineering and Systems Science > Audio and Speech Processing arXiv:2602.22039 (eess) [Submitted on 25 Feb 2026] Title:TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition Authors:Cheng-Yeh Yang, Chien-Chun Wang, Li-Wei Chen, Hung-Shin Lee, Hsin-Min Wang, Berlin Chen View a PDF of the paper titled TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition, by Cheng-Yeh Yang and 5 other authors View PDF HTML (experimental) Abstract:Low-resource automatic speech recognition (ASR) continues to pose significant challenges, primarily due to the limited availability of transcribed data for numerous languages. While a wealth of spoken content is accessible in television dramas and online videos, Taiwanese Hokkien exemplifies this issue, with transcriptions often being scarce and the majority of available subtitles provided only in Mandarin. To address this deficiency, we introduce TG-ASR for Taiwanese Hokkien drama speech recognition, a translation-guided ASR framework that utilizes multilingual translation embeddings to enhance recognition performance in low-resource environments. The framework is centered around the parallel gated cross-attention (PGCA) mechanism, which adaptively integrates embeddings from various auxiliary languages into the ASR decoder. This mechanism facilitates robust cross-linguistic semantic guidance while ensuring stable...

Read Original Article

[2602.22039] TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition

Summary

Why It Matters

Key Takeaways

Related Articles

[2603.14841] Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

[2603.17839] How do LLMs Compute Verbal Confidence

[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

[2603.09085] Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting

No comments

Stay updated with AI News