[2602.21772] UniWhisper: Efficient Continual Multi-task Training for Robust Universal Audio Representation

[2602.21772] UniWhisper: Efficient Continual Multi-task Training for Robust Universal Audio Representation

arXiv - AI 3 min read Article

Summary

UniWhisper introduces an efficient framework for continual multi-task training, enhancing audio representation across diverse tasks, outperforming existing models.

Why It Matters

This research addresses the limitations of current audio encoders that excel in specific domains but struggle in others. By proposing a unified training approach, UniWhisper aims to improve the robustness of audio representations, which is crucial for applications in speech recognition, environmental sound classification, and music analysis.

Key Takeaways

  • UniWhisper employs a continual multi-task training framework for audio tasks.
  • It achieves superior performance compared to existing models like Whisper.
  • The model is trained on a substantial dataset of 38k hours of public audio.
  • UniWhisper maintains strong performance in speech while improving general audio representation.
  • The approach simplifies training by using a unified instruction and answer format.

Computer Science > Sound arXiv:2602.21772 (cs) [Submitted on 25 Feb 2026] Title:UniWhisper: Efficient Continual Multi-task Training for Robust Universal Audio Representation Authors:Yuxuan Chen, Peize He, Haoyuan Xu, Junzi Zhang View a PDF of the paper titled UniWhisper: Efficient Continual Multi-task Training for Robust Universal Audio Representation, by Yuxuan Chen and 3 other authors View PDF HTML (experimental) Abstract:A universal audio representation should capture fine-grained speech cues and high-level semantics for environmental sounds and music in a single encoder. Existing encoders often excel in one domain but degrade in others. We propose UniWhisper, an efficient continual multi-task training framework that casts heterogeneous audio tasks into a unified instruction and answer format. This enables standard next-token training without task-specific heads and losses. We train it on 38k hours of public audio and assess the encoder using shallow MLP probes and k-nearest neighbors (kNN) on 20 tasks spanning speech, environmental sound, and music. UniWhisper reaches normalized weighted averages of 0.81 with MLP probes and 0.61 with kNN, compared to 0.64 and 0.46 for Whisper, while retaining strong speech performance. Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.21772 [cs.SD]   (or arXiv:2602.21772v1 [cs.SD] for this version)   https://doi.org/10.48550/arXiv.2602.21772 Focus to learn more arXiv-issued DOI via DataCite (pending registrat...

Related Articles

[2603.14841] Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling
Machine Learning

[2603.14841] Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

Abstract page for arXiv paper 2603.14841: Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

arXiv - AI · 4 min ·
[2603.17839] How do LLMs Compute Verbal Confidence
Llms

[2603.17839] How do LLMs Compute Verbal Confidence

Abstract page for arXiv paper 2603.17839: How do LLMs Compute Verbal Confidence

arXiv - AI · 4 min ·
[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models
Llms

[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

Abstract page for arXiv paper 2603.15970: 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight...

arXiv - AI · 4 min ·
[2603.09085] Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting
Llms

[2603.09085] Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting

Abstract page for arXiv paper 2603.09085: Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum ...

arXiv - AI · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime