[2512.02650] Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

[2512.02650] Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2512.02650: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

Computer Science > Computer Vision and Pattern Recognition arXiv:2512.02650 (cs) [Submitted on 2 Dec 2025 (v1), last revised 27 Mar 2026 (this version, v2)] Title:Hear What Matters! Text-conditioned Selective Video-to-Audio Generation Authors:Junwon Lee, Juhan Nam, Jiyoung Lee View a PDF of the paper titled Hear What Matters! Text-conditioned Selective Video-to-Audio Generation, by Junwon Lee and 2 other authors View PDF HTML (experimental) Abstract:This work introduces a new task, text-conditioned selective video-to-audio (V2A) generation, which produces only the user-intended sound from a multi-object video. This capability is especially crucial in multimedia production, where audio tracks are handled individually for each sound source for precise editing, mixing, and creative control. We propose SELVA, a novel text-conditioned V2A model that treats the text prompt as an explicit selector to distinctly extract prompt-relevant sound-source visual features from the video encoder. To suppress text-irrelevant activations with efficient video encoder finetuning, the proposed supplementary tokens promote cross-attention to yield robust semantic and temporal grounding. SELVA further employs an autonomous video-mixing scheme in a self-supervised manner to overcome the lack of mono audio track supervision. We evaluate SELVA on VGG-MONOAUDIO, a curated benchmark of clean single-source videos for such a task. Extensive experiments and ablations consistently verify its effectiveness...

Originally published on March 30, 2026. Curated by AI News.

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
[2603.23899] SM-Net: Learning a Continuous Spectral Manifold from Multiple Stellar Libraries
Machine Learning

[2603.23899] SM-Net: Learning a Continuous Spectral Manifold from Multiple Stellar Libraries

Abstract page for arXiv paper 2603.23899: SM-Net: Learning a Continuous Spectral Manifold from Multiple Stellar Libraries

arXiv - AI · 4 min ·
[2603.16629] MLLM-based Textual Explanations for Face Comparison
Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min ·
[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation
Llms

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Abstract page for arXiv paper 2603.15159: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

arXiv - AI · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime