Ai Infrastructure

[D] 60% MatMul Performance Bug in cuBLAS on RTX 5090 [D]

Reddit - Machine Learning April 10, 2026 1 min read

About this article

cuBLAS dispatches an inefficient kernel for every batched FP32 workload, from 256×256 to 8192×8192×8. It only uses ~40% of the available compute on RTX GPUs. Tested with RTX 5090, but likely all RTX non-Pro GPUs are affected. I tested with the latest CUDA 13.2.51, cuBLAS 13.3.0, and driver 595.58.03. Previous versions are even worse. I wrote a simple, yet efficient kernel and compared it to cuBLAS across a variety of workloads. Batched perf vs cuBLAS on 5090 (>100% means my kernel is faste...

You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket

Originally published on April 10, 2026. Curated by AI News.

Read Original Article

Llms

A 135M model achieves coherent output on a laptop CPU. Scaling is σ compensation, not intelligence.

SmolLM2 135M. Lenovo T14 CPU. No GPU. No RLHF. No BPE. Coherent, non-sycophantic, contextually appropriate output. First message. No prio...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Ai Infrastructure

What actually makes AI useful for writing (most people are doing it wrong)

Been using AI for writing for a while and figured out what actually moves the needle vs what's just hype. The biggest thing: stop treatin...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 12 hours ago

Ai Infrastructure

Dell and HIVE partner to deploy Nvidia’s next-generation AI chips

AI News - General · 1 min · about 13 hours ago

More in Ai Infrastructure: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

[D] 60% MatMul Performance Bug in cuBLAS on RTX 5090 [D]

About this article

Related Articles

A 135M model achieves coherent output on a laptop CPU. Scaling is σ compensation, not intelligence.

What actually makes AI useful for writing (most people are doing it wrong)

UMKC Announces New Master of Science in Artificial Intelligence

Dell and HIVE partner to deploy Nvidia’s next-generation AI chips

No comments

Stay updated with AI News