[2511.15927] DiffuMamba: High-Throughput Diffusion LMs with Mamba

[2511.15927] DiffuMamba: High-Throughput Diffusion LMs with Mamba Backbone

arXiv - Machine Learning March 02, 2026 3 min read

About this article

Abstract page for arXiv paper 2511.15927: DiffuMamba: High-Throughput Diffusion LMs with Mamba Backbone

Computer Science > Machine Learning arXiv:2511.15927 (cs) [Submitted on 19 Nov 2025 (v1), last revised 27 Feb 2026 (this version, v3)] Title:DiffuMamba: High-Throughput Diffusion LMs with Mamba Backbone Authors:Vaibhav Singh, Oleksiy Ostapenko, Pierre-André Noël, Eugene Belilovsky, Torsten Scholak View a PDF of the paper titled DiffuMamba: High-Throughput Diffusion LMs with Mamba Backbone, by Vaibhav Singh and 4 other authors View PDF HTML (experimental) Abstract:Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) generation, yet their reliance on Transformer backbones limits inference efficiency due to quadratic attention or KV-cache overhead. We introduce DiffuMamba, a masked diffusion language model built on a bidirectional Mamba backbone that combines the diffusion objective with linear-time sequence modeling, and DiffuMamba-H, a hybrid variant with interleaved attention. Across scales up to 1.3B parameters, our models match Transformer-based diffusion in downstream performance while achieving up to 8.2x and 4.3x higher inference throughput, respectively, on long sequences. We further present a systematic analysis of inference efficiency across modern DLM variants combining asymptotic complexity with empirical measurements. Notably, cache-efficient block diffusion with Mamba mixers emerges as the only strategy that scales linearly with sequence length and achieves the strongest performance across all baselines, suggesting a p...

Originally published on March 02, 2026. Curated by AI News.

Llms

This Is Not Hacking. This Is Structured Intelligence.

Watch me demonstrate everything I've been talking about—live, in real time. The Setup: Maestro University AI enrollment system Standard c...

Reddit - Artificial Intelligence · 1 min · 1 minute ago

Llms

[D] Howcome Muon is only being used for Transformers?

Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets tu...

Reddit - Machine Learning · 1 min · 15 minutes ago

Llms

[P] I trained a language model from scratch for a low resource language and got it running fully on-device on Android (no GPU, demo)

Hi Everybody! I just wanted to share an update on a project I’ve been working on called BULaMU, a family of language models trained (20M,...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

A study found that sycophancy is pervasive among chatbots, and that bots are more likely than human peers to affirm a person's bad behavior.

AI Tools & Products · 6 min · about 2 hours ago

[2511.15927] DiffuMamba: High-Throughput Diffusion LMs with Mamba Backbone

About this article

Related Articles

This Is Not Hacking. This Is Structured Intelligence.

[D] Howcome Muon is only being used for Transformers?

[P] I trained a language model from scratch for a low resource language and got it running fully on-device on Android (no GPU, demo)

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

No comments

Stay updated with AI News