[2602.24245] Chunk-wise Attention Transducers for Fast and Accurate

[2602.24245] Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text

arXiv - Machine Learning March 02, 2026 3 min read

About this article

Abstract page for arXiv paper 2602.24245: Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text

Computer Science > Machine Learning arXiv:2602.24245 (cs) [Submitted on 27 Feb 2026] Title:Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text Authors:Hainan Xu, Vladimir Bataev, Travis M. Bartley, Jagadeesh Balam View a PDF of the paper titled Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text, by Hainan Xu and 3 other authors View PDF HTML (experimental) Abstract:We propose Chunk-wise Attention Transducer (CHAT), a novel extension to RNN-T models that processes audio in fixed-size chunks while employing cross-attention within each chunk. This hybrid approach maintains RNN-T's streaming capability while introducing controlled flexibility for local alignment modeling. CHAT significantly reduces the temporal dimension that RNN-T must handle, yielding substantial efficiency improvements: up to 46.2% reduction in peak training memory, up to 1.36X faster training, and up to 1.69X faster inference. Alongside these efficiency gains, CHAT achieves consistent accuracy improvements over RNN-T across multiple languages and tasks -- up to 6.3% relative WER reduction for speech recognition and up to 18.0% BLEU improvement for speech translation. The method proves particularly effective for speech translation, where RNN-T's strict monotonic alignment hurts performance. Our results demonstrate that the CHAT model offers a practical solution for deploying more capable streaming speech models without sacrificing real-time constraint...

Originally published on March 02, 2026. Curated by AI News.

Llms

World models will be the next big thing, bye-bye LLMs

Was at Nvidia's GTC conference recently and honestly, it was one of the most eye-opening events I've attended in a while. There was a lot...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Machine Learning

[D] Got my first offer after months of searching — below posted range, contract-to-hire, and worried it may pause my search. Do I take it?

I could really use some outside perspective. I’m a senior ML/CV engineer in Canada with about 5–6 years across research and industry. Mas...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

[Research] AI training is bad, so I started an research

Hello, I started researching about AI training Q:Why? R: Because AI training is bad right now. Q: What do you mean its bad? R: Like when ...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

We built an open-source prototype that applies Unix philosophy to retrieval pipelines. Each stage (PII redaction, chunking, dedup, embedd...

Reddit - Machine Learning · 1 min · about 5 hours ago

[2602.24245] Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text

About this article

Related Articles

World models will be the next big thing, bye-bye LLMs

[D] Got my first offer after months of searching — below posted range, contract-to-hire, and worried it may pause my search. Do I take it?

[Research] AI training is bad, so I started an research

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

No comments

Stay updated with AI News