[2601.07160] AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units

[2601.07160] AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2601.07160: AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units

Computer Science > Artificial Intelligence arXiv:2601.07160 (cs) [Submitted on 12 Jan 2026 (v1), last revised 17 Apr 2026 (this version, v2)] Title:AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units Authors:Xinzi Cao, Jianyang Zhai, Pengfei Li, Zhiheng Hu, Cen Yan, Bingxu Mu, Guanghuan Fang, Bin She, Jiayu Li, Yihan Su, Dongyang Tao, Xiansong Huang, Fan Xu, Feidiao Yang, Yao Lu, Chang-Dong Wang, Yutong Lu, Weicheng Xue, Bin Zhou, Yonghong Tian View a PDF of the paper titled AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units, by Xinzi Cao and 19 other authors View PDF HTML (experimental) Abstract:To meet the ever-increasing demand for computational efficiency, Neural Processing Units (NPUs) have become critical in modern AI infrastructure. However, unlocking their full potential requires developing high-performance compute kernels using vendor-specific Domain-Specific Languages (DSLs), a task that demands deep hardware expertise and is labor-intensive. While Large Language Models (LLMs) have shown promise in general code generation, they struggle with the strict constraints and scarcity of training data in the NPU domain. Our preliminary study reveals that state-of-the-art general-purpose LLMs fail to generate functional complex kernels for Ascend NPUs, yielding a near-zero success rate. To address these challenges, we propose AscendKernelGen, a generation-evaluation integrated framework...

Originally published on April 20, 2026. Curated by AI News.

Related Articles

Llms

C++ CuTe / CUTLASS vs CuTeDSL (Python) in 2026 — what should new GPU kernel / LLM inference engineers actually learn?[D]

For people just starting out in GPU kernel engineering or LLM inference (FlashAttention / FlashInfer / SGLang / vLLM style work), most jo...

Reddit - Machine Learning · 1 min ·
[2511.10262] MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
Llms

[2511.10262] MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

Abstract page for arXiv paper 2511.10262: MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duple...

arXiv - AI · 4 min ·
[2602.07303] KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction
Llms

[2602.07303] KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

Abstract page for arXiv paper 2602.07303: KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

arXiv - AI · 4 min ·
[2602.05523] Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations
Llms

[2602.05523] Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations

Abstract page for arXiv paper 2602.05523: Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformat...

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime