[2602.14785] SA-SSL-MOS: Self-supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment

[2602.14785] SA-SSL-MOS: Self-supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment

arXiv - Machine Learning 4 min read Article

Summary

The paper presents a novel approach to speech quality assessment using self-supervised learning and spectral augmentation, addressing challenges in estimating mean-opinion-scores for multi-rate speech.

Why It Matters

This research is significant as it tackles the limitations of existing self-supervised learning models in speech quality assessment, particularly regarding high-frequency information. By improving the accuracy of mean-opinion-score predictions, it can enhance applications in telecommunications and audio processing.

Key Takeaways

  • Introduces a spectrogram-augmented self-supervised learning method.
  • Addresses the challenge of limited MOS-labeled datasets for multi-rate speech.
  • Demonstrates improved generalization through a two-step training scheme.
  • Highlights the importance of high-frequency information in speech assessment.
  • Experimental results indicate significant performance enhancements.

Electrical Engineering and Systems Science > Audio and Speech Processing arXiv:2602.14785 (eess) [Submitted on 16 Feb 2026] Title:SA-SSL-MOS: Self-supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment Authors:Fengyuan Cao, Xinyu Liang, Fredrik Cumlin, Victor Ungureanu, Chandan K. A. Reddy, Christian Schuldt, Saikat Chatterjee View a PDF of the paper titled SA-SSL-MOS: Self-supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment, by Fengyuan Cao and 6 other authors View PDF HTML (experimental) Abstract:Designing a speech quality assessment (SQA) system for estimating mean-opinion-score (MOS) of multi-rate speech with varying sampling frequency (16-48 kHz) is a challenging task. The challenge arises due to the limited availability of a MOS-labeled training dataset comprising multi-rate speech samples. While self-supervised learning (SSL) models have been widely adopted in SQA to boost performance, a key limitation is that they are pretrained on 16 kHz speech and therefore discard high-frequency information present in higher sampling rates. To address this issue, we propose a spectrogram-augmented SSL method that incorporates high-frequency features (up to 48 kHz sampling rate) through a parallel-branch architecture. We further introduce a two-step training scheme: the model is first pre-trained on a large 48 kHz dataset and then fine-tuned on a smaller multi-rate dataset. ...

Related Articles

Llms

What's your "When Language Model AI can do X, I'll be impressed"?

I have two at the top of my mind: When it can read musical notes. I will be mildly impressed when I can paste in a picture of musical not...

Reddit - Artificial Intelligence · 1 min ·
Meta’s New AI Asked for My Raw Health Data—and Gave Me Terrible Advice | WIRED
Machine Learning

Meta’s New AI Asked for My Raw Health Data—and Gave Me Terrible Advice | WIRED

Meta’s Muse Spark model offers to analyze users’ health data, including lab results. Beyond the obvious privacy risks, it’s not a capable...

Wired - AI · 9 min ·
Machine Learning

What image/video training data is hardest to find right now? [R]

I'm building a crowdsourced photo collection platform (contributors take photos with smartphones, we auto-label with YOLO/CLIP + enrich w...

Reddit - Machine Learning · 1 min ·
Machine Learning

I implemented DPO from the paper and the reward margin hit 599 here's what that actually means [R]

DPO (Rafailov et al., NeurIPS 2023) is supposed to be the clean alternative to PPO. No reward model in the training loop, no value functi...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime