Llms Machine Learning Ai Infrastructure Ai Agents

[2602.12305] OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization

arXiv - Machine Learning February 16, 2026 4 min read Article

Summary

OptiML is a novel framework that enhances CUDA kernel optimization through program synthesis, leveraging large language models for improved performance and verification.

Why It Matters

With the increasing complexity of optimizing CUDA kernels, OptiML addresses a critical need in the field of machine learning and software engineering. By combining natural language processing with systematic optimization techniques, it offers a promising solution for developers seeking to improve performance while reducing manual effort.

Key Takeaways

OptiML integrates natural language processing to generate CUDA kernels.
The framework employs a two-stage process: synthesis and optimization.
Performance improvements are verified through hardware-aware profiling.
OptiML outperforms existing LLM baselines in kernel optimization.
The approach provides interpretable optimization trajectories based on profiler data.

Computer Science > Machine Learning arXiv:2602.12305 (cs) [Submitted on 12 Feb 2026] Title:OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization Authors:Arijit Bhattacharjee, Heng Ping, Son Vu Le, Paul Bogdan, Nesreen K. Ahmed, Ali Jannesari View a PDF of the paper titled OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization, by Arijit Bhattacharjee and 5 other authors View PDF HTML (experimental) Abstract:Generating high-performance CUDA kernels remains challenging due to the need to navigate a combinatorial space of low-level transformations under noisy and expensive hardware feedback. Although large language models can synthesize functionally correct CUDA code, achieving competitive performance requires systematic exploration and verification of optimization choices. We present OptiML, an end-to-end framework that maps either natural-language intent or input CUDA code to performance-optimized CUDA kernels by formulating kernel optimization as search under verification. OptiML consists of two decoupled stages. When the input is natural language, a Mixture-of-Thoughts generator (OptiML-G) acts as a proposal policy over kernel implementation strategies, producing an initial executable program. A search-based optimizer (OptiML-X) then refines either synthesized or user-provided kernels using Monte Carlo Tree Search over LLM-driven edits, guided by a hardware-aware reward derived from profiler feedback. Each candidate...

Read Original Article

[2602.12305] OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization

Summary

Why It Matters

Key Takeaways

Related Articles

We're Learning Backwards: LLMs build intelligence in reverse, and the Scaling Hypothesis is bounded

Claude cannot be trusted to perform complex engineering tasks

At the HumanX conference, everyone was talking about Claude | TechCrunch

From LLMs to hallucinations, here's a simple guide to common AI terms | TechCrunch

No comments

Stay updated with AI News