[2602.18104] MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows

[2602.18104] MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows

arXiv - Machine Learning 4 min read Article

Summary

MeanVoiceFlow introduces a one-step nonparallel voice conversion model that enhances speech quality and speaker similarity while reducing conversion time.

Why It Matters

This research addresses the limitations of existing voice conversion models that rely on iterative inference, offering a more efficient alternative. By utilizing mean flows and innovative training techniques, it presents a significant advancement in the field of audio processing and machine learning, which could have implications for various applications in speech synthesis and AI-driven communication.

Key Takeaways

  • MeanVoiceFlow achieves high-quality voice conversion in a single step.
  • The model employs average velocity for more accurate time integration.
  • Introduces a structural margin reconstruction loss to stabilize training.
  • Conditional diffused-input training enhances model performance.
  • Experimental results show comparable performance to multi-step models.

Computer Science > Sound arXiv:2602.18104 (cs) [Submitted on 20 Feb 2026] Title:MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows Authors:Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo View a PDF of the paper titled MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows, by Takuhiro Kaneko and 3 other authors View PDF HTML (experimental) Abstract:In voice conversion (VC) applications, diffusion and flow-matching models have exhibited exceptional speech quality and speaker similarity performances. However, they are limited by slow conversion owing to their iterative inference. Consequently, we propose MeanVoiceFlow, a novel one-step nonparallel VC model based on mean flows, which can be trained from scratch without requiring pretraining or distillation. Unlike conventional flow matching that uses instantaneous velocity, mean flows employ average velocity to more accurately compute the time integral along the inference path in a single step. However, training the average velocity requires its derivative to compute the target velocity, which can cause instability. Therefore, we introduce a structural margin reconstruction loss as a zero-input constraint, which moderately regularizes the input-output behavior of the model without harmful statistical averaging. Furthermore, we propose conditional diffused-input training in which a mixture of noise and source data is used as input to the model during both training and inference. This e...

Related Articles

Top 10 AI certifications and courses for 2026
Ai Startups

Top 10 AI certifications and courses for 2026

This article reviews the top 10 AI certifications and courses for 2026, highlighting their significance in a rapidly evolving field and t...

AI Events · 15 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

[P] MCGrad: fix calibration of your ML model in subgroups

Hi r/MachineLearning, We’re open-sourcing MCGrad, a Python package for multicalibration–developed and deployed in production at Meta. Thi...

Reddit - Machine Learning · 1 min ·
Machine Learning

Ml project user give dataset and I give best model [D] [P]

Tl,dr : suggest me a solution to create a ai ml project where user will give his dataset as input and the project should give best model ...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime