[2603.00563] Whisper-MLA: Reducing GPU Memory Consumption of ASR Models based on MHA2MLA Conversion

[2603.00563] Whisper-MLA: Reducing GPU Memory Consumption of ASR Models based on MHA2MLA Conversion

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.00563: Whisper-MLA: Reducing GPU Memory Consumption of ASR Models based on MHA2MLA Conversion

Computer Science > Sound arXiv:2603.00563 (cs) [Submitted on 28 Feb 2026] Title:Whisper-MLA: Reducing GPU Memory Consumption of ASR Models based on MHA2MLA Conversion Authors:Sen Zhang, Jianguo Wei, Wenhuan Lu, Xianghu Yue, Wei Li, Qiang Li, Pengcheng Zhao, Ming Cai, Luo Si View a PDF of the paper titled Whisper-MLA: Reducing GPU Memory Consumption of ASR Models based on MHA2MLA Conversion, by Sen Zhang and 8 other authors View PDF HTML (experimental) Abstract:The Transformer-based Whisper model has achieved state-of-the-art performance in Automatic Speech Recognition (ASR). However, its Multi-Head Attention (MHA) mechanism results in significant GPU memory consumption due to the linearly growing Key-Value (KV) cache usage, which is problematic for many applications especially with long-form audio. To address this, we introduce Whisper-MLA, a novel architecture that incorporates Multi-Head Latent Attention (MLA) into the Whisper model. Specifically, we adapt MLA for Whisper's absolute positional embeddings and systematically investigate its application across encoder self-attention, decoder self-attention, and cross-attention modules. Empirical results indicate that applying MLA exclusively to decoder self-attention yields the desired balance between performance and memory efficiency. Our proposed approach allows conversion of a pretrained Whisper model to Whisper-MLA with minimal fine-tuning. Extensive experiments on the LibriSpeech benchmark validate the effectiveness of...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

Llms

Study: LLMs Able to De-Anonymize User Accounts on Reddit, Hacker News & Other "Pseudonymous" Platforms; Report Co-Author Expands, Advises

Advice from the study's co-author: "Be aware that it’s not any single post that identifies you, but the combination of small details acro...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] Best websites for pytorch/numpy interviews

Hello, I’m at the last year of my PHD and I’m starting to prepare interviews. I’m mainly aiming at applied scientist/research engineer or...

Reddit - Machine Learning · 1 min ·
Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min ·
Machine Learning

Can AI truly be creative?

AI has no imagination. “Creativity is the ability to generate novel and valuable ideas or works through the exercise of imagination” http...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime