[2602.12305] OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization

[2602.12305] OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization

arXiv - Machine Learning 4 min read Article

Summary

OptiML is a novel framework that enhances CUDA kernel optimization through program synthesis, leveraging large language models for improved performance and verification.

Why It Matters

With the increasing complexity of optimizing CUDA kernels, OptiML addresses a critical need in the field of machine learning and software engineering. By combining natural language processing with systematic optimization techniques, it offers a promising solution for developers seeking to improve performance while reducing manual effort.

Key Takeaways

  • OptiML integrates natural language processing to generate CUDA kernels.
  • The framework employs a two-stage process: synthesis and optimization.
  • Performance improvements are verified through hardware-aware profiling.
  • OptiML outperforms existing LLM baselines in kernel optimization.
  • The approach provides interpretable optimization trajectories based on profiler data.

Computer Science > Machine Learning arXiv:2602.12305 (cs) [Submitted on 12 Feb 2026] Title:OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization Authors:Arijit Bhattacharjee, Heng Ping, Son Vu Le, Paul Bogdan, Nesreen K. Ahmed, Ali Jannesari View a PDF of the paper titled OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization, by Arijit Bhattacharjee and 5 other authors View PDF HTML (experimental) Abstract:Generating high-performance CUDA kernels remains challenging due to the need to navigate a combinatorial space of low-level transformations under noisy and expensive hardware feedback. Although large language models can synthesize functionally correct CUDA code, achieving competitive performance requires systematic exploration and verification of optimization choices. We present OptiML, an end-to-end framework that maps either natural-language intent or input CUDA code to performance-optimized CUDA kernels by formulating kernel optimization as search under verification. OptiML consists of two decoupled stages. When the input is natural language, a Mixture-of-Thoughts generator (OptiML-G) acts as a proposal policy over kernel implementation strategies, producing an initial executable program. A search-based optimizer (OptiML-X) then refines either synthesized or user-provided kernels using Monte Carlo Tree Search over LLM-driven edits, guided by a hardware-aware reward derived from profiler feedback. Each candidate...

Related Articles

Llms

We're Learning Backwards: LLMs build intelligence in reverse, and the Scaling Hypothesis is bounded

submitted by /u/preyneyv [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude cannot be trusted to perform complex engineering tasks

AMD’s AI director just analyzed 6,852 Claude Code sessions, 234,760 tool calls, and 17,871 thinking blocks. Her conclusion: “Claude canno...

Reddit - Artificial Intelligence · 1 min ·
At the HumanX conference, everyone was talking about Claude | TechCrunch
Llms

At the HumanX conference, everyone was talking about Claude | TechCrunch

Anthropic was the star of the show at San Francisco's AI-centric conference.

TechCrunch - AI · 6 min ·
From LLMs to hallucinations, here's a simple guide to common AI terms | TechCrunch
Llms

From LLMs to hallucinations, here's a simple guide to common AI terms | TechCrunch

The rise of AI has brought an avalanche of new terms and slang. Here is a glossary with definitions of some of the most important words a...

TechCrunch - AI · 19 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime