[2512.19703] ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval
About this article
Abstract page for arXiv paper 2512.19703: ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval
Electrical Engineering and Systems Science > Audio and Speech Processing arXiv:2512.19703 (eess) [Submitted on 11 Dec 2025 (v1), last revised 24 Mar 2026 (this version, v2)] Title:ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval Authors:Siyuan Fu, Xuchen Guo, Mingjun Liu, Hongxiang Li, Boyin Tan, Gongxi Zhu, Xianwei Zhuang, Jinghan Ru, Yuxin Xie, Yuguo Yin View a PDF of the paper titled ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval, by Siyuan Fu and 9 other authors View PDF HTML (experimental) Abstract:The dominant paradigm for Audio-Text Retrieval (ATR) relies on dual-encoder architectures optimized via mini-batch contrastive learning. However, restricting optimization to local in-batch samples creates a fundamental limitation we term the Gradient Locality Bottleneck (GLB), which prevents the resolution of acoustic ambiguities and hinders the learning of rare long-tail concepts. While external knowledge injection can break this bottleneck, it often triggers a problem called Representation-Drift Mismatch (RDM), where a static knowledge base becomes misaligned with evolving encoders, degrading guidance into noise. To address these intertwined challenges, we propose the Adaptive Self-improving Knowledge (ASK) framework. ASK breaks the GLB via multi-grained knowledge injection and mitigates RDM through a dynamic refinement strategy that synchronizes the knowledge base with the model. Additionally, an adaptive reliability we...