[2602.00681] Audio-to-Image Bird Species Retrieval without Audio-Image

[2602.00681] Audio-to-Image Bird Species Retrieval without Audio-Image Pairs via Text Distillation

arXiv - Machine Learning April 07, 2026 4 min read

About this article

Abstract page for arXiv paper 2602.00681: Audio-to-Image Bird Species Retrieval without Audio-Image Pairs via Text Distillation

Computer Science > Sound arXiv:2602.00681 (cs) [Submitted on 31 Jan 2026 (v1), last revised 4 Apr 2026 (this version, v2)] Title:Audio-to-Image Bird Species Retrieval without Audio-Image Pairs via Text Distillation Authors:Ilyass Moummad, Marius Miron, Lukas Rauch, David Robinson, Alexis Joly, Olivier Pietquin, Emmanuel Chemla, Matthieu Geist View a PDF of the paper titled Audio-to-Image Bird Species Retrieval without Audio-Image Pairs via Text Distillation, by Ilyass Moummad and 7 other authors View PDF HTML (experimental) Abstract:Audio-to-image retrieval offers an interpretable alternative to audio-only classification for bioacoustic species recognition, but learning aligned audio-image representations is challenging due to the scarcity of paired audio-image data. We propose a simple and data-efficient approach that enables audio-to-image retrieval without any audio-image supervision. Our proposed method uses text as a semantic intermediary: we distill the text embedding space of a pretrained image-text model (BioCLIP-2), which encodes rich visual and taxonomic structure, into a pretrained audio-text model (BioLingual) by fine-tuning its audio encoder with a contrastive objective. This distillation transfers visually grounded semantics into the audio representation, inducing emergent alignment between audio and image embeddings without using images during training. We evaluate the resulting model on multiple bioacoustic benchmarks. The distilled audio encoder preserves ...

Originally published on April 07, 2026. Curated by AI News.

Nlp

[D] Is ACL more about the benchmarks now?

I am not a NLP guy, but afaik ACL is one of the premium venues of NLP. And given that the results were announced recently, my LinkedIn an...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

[2604.01676] GPA: Learning GUI Process Automation from Demonstrations

Abstract page for arXiv paper 2604.01676: GPA: Learning GUI Process Automation from Demonstrations

arXiv - AI · 3 min · about 2 hours ago

Llms

[2604.01413] Adaptive Stopping for Multi-Turn LLM Reasoning

Abstract page for arXiv paper 2604.01413: Adaptive Stopping for Multi-Turn LLM Reasoning

arXiv - AI · 4 min · about 2 hours ago

Nlp

[2603.13777] Generate Then Correct: Single Shot Global Correction for Aspect Sentiment Quad Prediction

Abstract page for arXiv paper 2603.13777: Generate Then Correct: Single Shot Global Correction for Aspect Sentiment Quad Prediction