[2603.02236] CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

[2603.02236] CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.02236: CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

Computer Science > Machine Learning arXiv:2603.02236 (cs) [Submitted on 13 Feb 2026] Title:CUDABench: Benchmarking LLMs for Text-to-CUDA Generation Authors:Jiace Zhu, Wentao Chen, Qi Fan, Zhixing Ren, Junying Wu, Xing Zhe Chai, Chotiwit Rungrueangwutthinon, Yehan Ma, An Zou View a PDF of the paper titled CUDABench: Benchmarking LLMs for Text-to-CUDA Generation, by Jiace Zhu and 8 other authors View PDF HTML (experimental) Abstract:Recent studies have demonstrated the potential of Large Language Models (LLMs) in generating GPU Kernels. Current benchmarks focus on the translation of high-level languages into CUDA, overlooking the more general and challenging task of text-to-CUDA generation. Furthermore, given the hardware-specific and performance-critical features of GPU programming, accurately assessing the performance of LLM-generated GPU programs is nontrivial. In this work, we introduce CUDABench, a comprehensive benchmark designed to evaluate the text-to-CUDA capabilities of LLMs. First, we construct CUDABench-Set, which covers Breadth-Depth-Difficulty evaluation space in diverse application domains, including artificial intelligence, scientific computing, and data analytics, etc. Furthermore, we propose CUDABench-Score and Generative Verification Pipeline that assess (1) compilation correctness, (2) functional consistency through execution-based verification, and (3) a novel roofline-based metric, Performance-Score. Benchmarking state-of-the-art LLMs reveals insightful...

Originally published on March 04, 2026. Curated by AI News.

Related Articles

You can now use ChatGPT with Apple’s CarPlay | The Verge
Llms

You can now use ChatGPT with Apple’s CarPlay | The Verge

ChatGPT is now accessible from your CarPlay dashboard if you have iOS 26.4 or newer and the latest version of the ChatGPT app.

The Verge - AI · 3 min ·
Llms

Have Companies Began Adopting Claude Co-Work at an Enterprise Level?

Hi Guys, My company is considering purchasing the Claude Enterprise plan. The main two constraints are: - Being able to block usage of Cl...

Reddit - Artificial Intelligence · 1 min ·
Llms

What I learned about multi-agent coordination running 9 specialized Claude agents

I've been experimenting with multi-agent AI systems and ended up building something more ambitious than I originally planned: a fully ope...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] The problem with comparing AI memory system benchmarks — different evaluation methods make scores meaningless

I've been reviewing how various AI memory systems evaluate their performance and noticed a fundamental issue with cross-system comparison...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime