Machine Learning Generative Ai Ai Infrastructure Ai Agents

[2602.18104] MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows

arXiv - Machine Learning February 23, 2026 4 min read Article

Summary

MeanVoiceFlow introduces a one-step nonparallel voice conversion model that enhances speech quality and speaker similarity while reducing conversion time.

Why It Matters

This research addresses the limitations of existing voice conversion models that rely on iterative inference, offering a more efficient alternative. By utilizing mean flows and innovative training techniques, it presents a significant advancement in the field of audio processing and machine learning, which could have implications for various applications in speech synthesis and AI-driven communication.

Key Takeaways

MeanVoiceFlow achieves high-quality voice conversion in a single step.
The model employs average velocity for more accurate time integration.
Introduces a structural margin reconstruction loss to stabilize training.
Conditional diffused-input training enhances model performance.
Experimental results show comparable performance to multi-step models.

Computer Science > Sound arXiv:2602.18104 (cs) [Submitted on 20 Feb 2026] Title:MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows Authors:Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo View a PDF of the paper titled MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows, by Takuhiro Kaneko and 3 other authors View PDF HTML (experimental) Abstract:In voice conversion (VC) applications, diffusion and flow-matching models have exhibited exceptional speech quality and speaker similarity performances. However, they are limited by slow conversion owing to their iterative inference. Consequently, we propose MeanVoiceFlow, a novel one-step nonparallel VC model based on mean flows, which can be trained from scratch without requiring pretraining or distillation. Unlike conventional flow matching that uses instantaneous velocity, mean flows employ average velocity to more accurately compute the time integral along the inference path in a single step. However, training the average velocity requires its derivative to compute the target velocity, which can cause instability. Therefore, we introduce a structural margin reconstruction loss as a zero-input constraint, which moderately regularizes the input-output behavior of the model without harmful statistical averaging. Furthermore, we propose conditional diffused-input training in which a mixture of noise and source data is used as input to the model during both training and inference. This e...

Read Original Article

[2602.18104] MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows

Summary

Why It Matters

Key Takeaways

Related Articles

Top 10 AI certifications and courses for 2026

UMKC Announces New Master of Science in Artificial Intelligence

[P] MCGrad: fix calibration of your ML model in subgroups

Ml project user give dataset and I give best model [D] [P]

No comments

Stay updated with AI News