[2602.21647] Mitigating Structural Noise in Low-Resource S2TT: An Optimized Cascaded Nepali-English Pipeline with Punctuation Restoration
Summary
This paper presents an optimized cascaded Nepali-English speech-to-text translation system that mitigates structural noise from ASR, enhancing translation quality through a punctuation restoration module.
Why It Matters
The research addresses challenges in low-resource language translation, specifically Nepali, by improving the accuracy of speech-to-text systems. The findings highlight the importance of punctuation in translation quality, offering insights for further developments in natural language processing for similar languages.
Key Takeaways
- An optimized Nepali-English S2TT system was developed, improving translation accuracy.
- Punctuation loss significantly degrades translation quality, causing a 20.7% drop in BLEU scores.
- The introduction of a Punctuation Restoration Module (PRM) led to a 4.90 BLEU point gain.
- Human assessments confirmed the superiority of the optimized pipeline in terms of adequacy and fluency.
- This research establishes a baseline for future developments in low-resource language translation systems.
Computer Science > Computation and Language arXiv:2602.21647 (cs) [Submitted on 25 Feb 2026] Title:Mitigating Structural Noise in Low-Resource S2TT: An Optimized Cascaded Nepali-English Pipeline with Punctuation Restoration Authors:Tangsang Chongbang, Pranesh Pyara Shrestha, Amrit Sarki, Anku Jaiswal View a PDF of the paper titled Mitigating Structural Noise in Low-Resource S2TT: An Optimized Cascaded Nepali-English Pipeline with Punctuation Restoration, by Tangsang Chongbang and 2 other authors View PDF HTML (experimental) Abstract:This paper presents and evaluates an optimized cascaded Nepali speech-to-English text translation (S2TT) system, focusing on mitigating structural noise introduced by Automatic Speech Recognition (ASR). We first establish highly proficient ASR and NMT components: a Wav2Vec2-XLS-R-300m model achieved a state-of-the-art 2.72% CER on OpenSLR-54, and a multi-stage fine-tuned MarianMT model reached a 28.32 BLEU score on the FLORES-200 benchmark. We empirically investigate the influence of punctuation loss, demonstrating that unpunctuated ASR output significantly degrades translation quality, causing a massive 20.7% relative BLEU drop on the FLORES benchmark. To overcome this, we propose and evaluate an intermediate Punctuation Restoration Module (PRM). The final S2TT pipeline was tested across three configurations on a custom dataset. The optimal configuration, which applied the PRM directly to ASR output, achieved a 4.90 BLEU point gain over the di...