[2509.06027] DreamAudio: Customized Text-to-Audio Generation with

[2509.06027] DreamAudio: Customized Text-to-Audio Generation with Diffusion Models

arXiv - AI March 25, 2026 4 min read

About this article

Abstract page for arXiv paper 2509.06027: DreamAudio: Customized Text-to-Audio Generation with Diffusion Models

Computer Science > Sound arXiv:2509.06027 (cs) [Submitted on 7 Sep 2025 (v1), last revised 24 Mar 2026 (this version, v2)] Title:DreamAudio: Customized Text-to-Audio Generation with Diffusion Models Authors:Yi Yuan, Xubo Liu, Haohe Liu, Xiyuan Kang, Zhuo Chen, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang View a PDF of the paper titled DreamAudio: Customized Text-to-Audio Generation with Diffusion Models, by Yi Yuan and 7 other authors View PDF HTML (experimental) Abstract:With the development of large-scale diffusion-based and language-modeling-based generative models, impressive progress has been achieved in text-to-audio generation. Despite producing high-quality outputs, existing text-to-audio models mainly aim to generate semantically aligned sound and fall short of controlling fine-grained acoustic characteristics of specific sounds. As a result, users who need specific sound content may find it difficult to generate the desired audio clips. In this paper, we present DreamAudio for customized text-to-audio generation (CTTA). Specifically, we introduce a new framework that is designed to enable the model to identify auditory information from user-provided reference concepts for audio generation. Given a few reference audio samples containing personalized audio events, our system can generate new audio samples that include these specific events. In addition, two types of datasets are developed for training and testing the proposed systems. The experiments show that DreamAu...

Originally published on March 25, 2026. Curated by AI News.

Llms

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

I have a problem statement where we are supposed to detect the attention level of student in a classroom, basically output whether he is ...

Reddit - Machine Learning · 1 min · 11 minutes ago

Llms

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

I'm looking to work with people interested in math, machine learning, or agentic coding, on creating a multi-agent framework to do fronti...

Reddit - Machine Learning · 1 min · about 1 hour ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 5 hours ago

Machine Learning

[D] Looking for definition of open-world ish learning problem

Hello! Recently I did a project where I initially had around 30 target classes. But at inference, the model had to be able to handle a lo...

Reddit - Machine Learning · 1 min · about 5 hours ago

[2509.06027] DreamAudio: Customized Text-to-Audio Generation with Diffusion Models

About this article

Related Articles

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

UMKC Announces New Master of Science in Artificial Intelligence

[D] Looking for definition of open-world ish learning problem

No comments

Stay updated with AI News