[2505.01448] OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models

[2505.01448] OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2505.01448: OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models

Computer Science > Machine Learning arXiv:2505.01448 (cs) [Submitted on 30 Apr 2025 (v1), last revised 30 Mar 2026 (this version, v2)] Title:OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models Authors:Shengkai Chen, Yifang Yin, Jinming Cao, Shili Xiang, Zhenguang Liu, Roger Zimmermann View a PDF of the paper titled OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models, by Shengkai Chen and 5 other authors View PDF HTML (experimental) Abstract:Audio-visual segmentation aims to separate sounding objects from videos by predicting pixel-level masks based on audio signals. Existing methods primarily concentrate on closed-set scenarios and direct audio-visual alignment and fusion, which limits their capability to generalize to new, unseen situations. In this paper, we propose OpenAVS, a novel training-free language-based approach that, for the first time, effectively aligns audio and visual modalities using text as a proxy for open-vocabulary Audio-Visual Segmentation (AVS). Equipped with multimedia foundation models, OpenAVS directly infers masks through 1) audio-to-text prompt generation, 2) LLM-guided prompt translation, and 3) text-to-visual sounding object segmentation. The objective of OpenAVS is to establish a simple yet flexible architecture that relies on the most appropriate foundation models by fully leveraging their capabilities to enable more effective knowledge transfer to the downstream AVS ...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

Llms

Anyone here using local models mainly to keep LLM costs under control?

Been noticing that once you use LLMs for real dev work, the cost conversation gets messy fast. It is not just raw API spend. It is retrie...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI for Materials Science starter kit [D]

Hi everyone, I've been close to Deep Learning for a while now, and have a good grasp of the fundamentals. So for the computational chemis...

Reddit - Machine Learning · 1 min ·
‘AI-based super attacker’ threat looms as top crypto exchanges scramble for access to powerful Claude model
Llms

‘AI-based super attacker’ threat looms as top crypto exchanges scramble for access to powerful Claude model

Anthropic’s new AI model found vulnerabilities in code that has existed for years. The company said it had to restrict public access sin...

AI Tools & Products · 4 min ·
My bets on open models, mid-2026
Machine Learning

My bets on open models, mid-2026

What I expect to come next and why, focused on the open-closed gap.

AI Tools & Products · 7 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime