[2509.25275] VoiceBridge: General Speech Restoration with One-step Latent Bridge Models

[2509.25275] VoiceBridge: General Speech Restoration with One-step Latent Bridge Models

arXiv - AI 4 min read Article

Summary

VoiceBridge introduces a novel one-step latent bridge model for general speech restoration, enhancing audio quality from various distortions using advanced neural techniques.

Why It Matters

This research addresses the limitations of existing speech enhancement models by providing a scalable solution that improves the quality of speech restoration across multiple tasks. As speech technology becomes increasingly integral to AI applications, advancements like VoiceBridge can significantly enhance user experiences in communication and media.

Key Takeaways

  • VoiceBridge utilizes a one-step latent bridge model for efficient speech restoration.
  • The model enhances waveform-latent space alignment through an energy-preserving variational autoencoder.
  • It successfully tackles diverse speech restoration tasks without the need for distillation.
  • Extensive validation shows superior performance in both in-domain and out-of-domain tasks.
  • The approach combines denoising and generative capabilities for improved audio quality.

Computer Science > Sound arXiv:2509.25275 (cs) [Submitted on 28 Sep 2025 (v1), last revised 15 Feb 2026 (this version, v3)] Title:VoiceBridge: General Speech Restoration with One-step Latent Bridge Models Authors:Chi Zhang, Zehua Chen, Kaiwen Zheng, Jun Zhu View a PDF of the paper titled VoiceBridge: General Speech Restoration with One-step Latent Bridge Models, by Chi Zhang and 3 other authors View PDF HTML (experimental) Abstract:Bridge models have been investigated in speech enhancement but are mostly single-task, with constrained general speech restoration (GSR) capability. In this work, we propose VoiceBridge, a one-step latent bridge model (LBM) for GSR, capable of efficiently reconstructing 48 kHz fullband speech from diverse distortions. To inherit the advantages of data-domain bridge models, we design an energy-preserving variational autoencoder, enhancing the waveform-latent space alignment over varying energy levels. By compressing waveform into continuous latent representations, VoiceBridge models~\textit{various} GSR tasks with a~\textit{single} latent-to-latent generative process backed by a scalable transformer. To alleviate the challenge of reconstructing the high-quality target from distinctively different low-quality priors, we propose a joint neural prior for GSR, uniformly reducing the burden of the LBM in diverse tasks. Building upon these designs, we further investigate bridge training objective by jointly tuning LBM, decoder and discriminator togethe...

Related Articles

University of Tartu thesis: transfer learning boosts Estonian AI models
Machine Learning

University of Tartu thesis: transfer learning boosts Estonian AI models

AI News - General · 4 min ·
Machine Learning

COD expands AI education with degree and machine learning certificate

AI News - General ·
AI literacy tops learning priorities but training efforts lag
Machine Learning

AI literacy tops learning priorities but training efforts lag

Employees say they lack clarity on how the technology's adoption will affect their roles and career progression, a Docebo report found.

AI News - General · 6 min ·
Machine Learning

Scientists uncover new method to generate protein datasets for training AI

AI News - General ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime