[D] - 1M tokens/second serving Qwen 3.5 27B on B200 GPUs, benchmark results and findings

Reddit - Machine Learning 1 min read

About this article

Wrote up the process of pushing Qwen 3.5 27B (dense, FP8) to 1.1M total tok/s on 96 B200 GPUs with vLLM v0.18.0. DP=8 nearly 4x'd throughput over TP=8. Model is too small for tensor parallelism to help on B200s. MTP-1 mattered more than anything else (GPU utilization was 0% without it). MTP-5 crashed with cudaErrorIllegalAddress. 97.1% scaling efficiency at 8 nodes, 96.5% at 12. TPOT flat at ~46ms regardless of node count. Inference Gateway (KV-cache-aware routing) added ~35% overhead vs Clus...

You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket

Originally published on March 26, 2026. Curated by AI News.

Related Articles

Llms

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

I'm looking to work with people interested in math, machine learning, or agentic coding, on creating a multi-agent framework to do fronti...

Reddit - Machine Learning · 1 min ·
I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED
Llms

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Ads are rolling out across the US on ChatGPT’s free tier. I asked OpenAI's bot 500 questions to see what these ads were like and how they...

Wired - AI · 9 min ·
Llms

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

Three days ago, I clicked the "Deploy OpenClaw In Seconds" button to get an overview of the new service, but I didn't build any automatio...

Reddit - Artificial Intelligence · 1 min ·
Google’s Gemini AI app debuts in Hong Kong
Llms

Google’s Gemini AI app debuts in Hong Kong

Tech giant’s chatbot service tops Apple’s app store chart in the city.

AI Tools & Products · 2 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime