[2603.20433] ALICE: A Multifaceted Evaluation Framework of Large Audio-Language Models' In-Context Learning Ability
About this article
Abstract page for arXiv paper 2603.20433: ALICE: A Multifaceted Evaluation Framework of Large Audio-Language Models' In-Context Learning Ability
Computer Science > Sound arXiv:2603.20433 (cs) [Submitted on 20 Mar 2026] Title:ALICE: A Multifaceted Evaluation Framework of Large Audio-Language Models' In-Context Learning Ability Authors:Yen-Ting Piao, Jay Chiehen Liao, Wei-Tang Chien, Toshiki Ogimoto, Shang-Tse Chen, Yun-Nung Chen, Chun-Yi Lee, Shao-Yuan Lo View a PDF of the paper titled ALICE: A Multifaceted Evaluation Framework of Large Audio-Language Models' In-Context Learning Ability, by Yen-Ting Piao and 7 other authors View PDF HTML (experimental) Abstract:While Large Audio-Language Models (LALMs) have been shown to exhibit degraded instruction-following capabilities, their ability to infer task patterns from in-context examples under audio conditioning remains unstudied. To address this gap, we present ALICE, a three-stage framework that progressively reduces textual guidance to systematically evaluate LALMs' in-context learning ability under audio conditioning. Evaluating six LALMs across four audio understanding tasks under two output constraint categories, we uncover a consistent asymmetry across all stages and LALMs: in-context demonstrations reliably improve format compliance but fail to improve, and often degrade, the core task performance. This suggests that LALMs can glean surface-level formatting patterns from demonstrations but may struggle to leverage cross-modal semantic grounding to reliably infer task objectives from audio-conditioned examples, highlighting potential limitations in current cross-...