[2601.12494] Multi-Task Instruction Tuning via Data Scheduling for Low-Resource Arabic AudioLLMs
About this article
Abstract page for arXiv paper 2601.12494: Multi-Task Instruction Tuning via Data Scheduling for Low-Resource Arabic AudioLLMs
Computer Science > Sound arXiv:2601.12494 (cs) [Submitted on 18 Jan 2026 (v1), last revised 23 Mar 2026 (this version, v2)] Title:Multi-Task Instruction Tuning via Data Scheduling for Low-Resource Arabic AudioLLMs Authors:Hunzalah Hassan Bhatti, Firoj Alam, Shammur Absar Chowdhury View a PDF of the paper titled Multi-Task Instruction Tuning via Data Scheduling for Low-Resource Arabic AudioLLMs, by Hunzalah Hassan Bhatti and 2 other authors View PDF Abstract:Audio large language models (LLMs) enable unified speech understanding and generation, but adapting them to linguistically complex and dialect-rich settings such as Arabic-English remains challenging. We present a controlled study of multi-task instruction tuning for an Arabic-centric audio LLM across generative tasks including ASR and speech and text summarization, and discriminative tasks including dialect and emotion recognition, in a resource-constrained setting. To support end-to-end Arabic speech summarization, we introduce AraMega-SSum, a first speech summarization resource for training and benchmarking Arabic-centric Audio-LLMs. We compare four training strategies (i) Uniform Task Mixing, (ii) Task-Progressive Curriculum (TPC), (iiii) Aligner-Based Diverse Sampling (ADS) for training-time batch construction, and (iv) A two-stage TPC->ADS strategy. Our results show a clear efficiency-robustness trade-off. ADS speeds up early convergence and improves paralinguistic performance, however, it hurts other tasks. A two...