[2604.02344] Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers

[2604.02344] Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2604.02344: Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers

Computer Science > Machine Learning arXiv:2604.02344 (cs) [Submitted on 9 Feb 2026] Title:Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers Authors:Jędrzej Maczan View a PDF of the paper titled Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers, by J\k{e}drzej Maczan View PDF HTML (experimental) Abstract:WebGPU's security-focused design imposes per-operation validation that compounds across the many small dispatches in neural network inference, yet the true cost of this overhead is poorly characterized. We present a systematic characterization of WebGPU dispatch overhead for LLM inference at batch size 1, spanning four GPU vendors (NVIDIA, AMD, Apple, Intel), two native implementations (Dawn, wgpu-native) and three browsers (Chrome, Safari, Firefox), and two model sizes (Qwen2.5-0.5B and 1.5B). Our primary contribution is a sequential-dispatch methodology that reveals naive single-operation benchmarks overestimate dispatch cost by ${\sim}20\times$. The true per-dispatch cost of WebGPU API overhead alone is 24-36 $\mu$s on Vulkan and 32-71 $\mu$s on Metal, while the total per-operation overhead including Python cost is ${\sim}95$~$\mu$s, which turns out to be a distinction critical for optimization. On Vulkan, kernel fusion improves throughput by 53%, while CUDA fusion provides no benefit, confirming that per-operation overhead is a primary ...

Originally published on April 06, 2026. Curated by AI News.

Related Articles

Google’s Gemini AI can answer your questions with 3D models and simulations
Llms

Google’s Gemini AI can answer your questions with 3D models and simulations

Google's latest upgrade for Gemini will allow the chatbot to generate interactive 3D models and simulations in response to your questions...

The Verge - AI · 4 min ·
Moody’s Integrates AI Agents With Anthropic’s Claude
Llms

Moody’s Integrates AI Agents With Anthropic’s Claude

AI Tools & Products · 4 min ·
AI on the couch: Anthropic gives Claude 20 hours of psychiatry
Llms

AI on the couch: Anthropic gives Claude 20 hours of psychiatry

AI Tools & Products · 6 min ·
These AI Glasses Switch Between ChatGPT and Gemini. Why Don't More Wearables Do This?
Llms

These AI Glasses Switch Between ChatGPT and Gemini. Why Don't More Wearables Do This?

AI Tools & Products · 6 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime