[2601.07160] AscendKernelGen: A Systematic Study of LLM-Based Kernel

[2601.07160] AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units

arXiv - AI April 20, 2026 4 min read

About this article

Abstract page for arXiv paper 2601.07160: AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units

Computer Science > Artificial Intelligence arXiv:2601.07160 (cs) [Submitted on 12 Jan 2026 (v1), last revised 17 Apr 2026 (this version, v2)] Title:AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units Authors:Xinzi Cao, Jianyang Zhai, Pengfei Li, Zhiheng Hu, Cen Yan, Bingxu Mu, Guanghuan Fang, Bin She, Jiayu Li, Yihan Su, Dongyang Tao, Xiansong Huang, Fan Xu, Feidiao Yang, Yao Lu, Chang-Dong Wang, Yutong Lu, Weicheng Xue, Bin Zhou, Yonghong Tian View a PDF of the paper titled AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units, by Xinzi Cao and 19 other authors View PDF HTML (experimental) Abstract:To meet the ever-increasing demand for computational efficiency, Neural Processing Units (NPUs) have become critical in modern AI infrastructure. However, unlocking their full potential requires developing high-performance compute kernels using vendor-specific Domain-Specific Languages (DSLs), a task that demands deep hardware expertise and is labor-intensive. While Large Language Models (LLMs) have shown promise in general code generation, they struggle with the strict constraints and scarcity of training data in the NPU domain. Our preliminary study reveals that state-of-the-art general-purpose LLMs fail to generate functional complex kernels for Ascend NPUs, yielding a near-zero success rate. To address these challenges, we propose AscendKernelGen, a generation-evaluation integrated framework...

Originally published on April 20, 2026. Curated by AI News.

Llms

C++ CuTe / CUTLASS vs CuTeDSL (Python) in 2026 — what should new GPU kernel / LLM inference engineers actually learn?[D]

For people just starting out in GPU kernel engineering or LLM inference (FlashAttention / FlashInfer / SGLang / vLLM style work), most jo...

Reddit - Machine Learning · 1 min · 41 minutes ago

Llms

[2511.10262] MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

Abstract page for arXiv paper 2511.10262: MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duple...

arXiv - AI · 4 min · about 1 hour ago

Llms

[2602.07303] KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

Abstract page for arXiv paper 2602.07303: KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

arXiv - AI · 4 min · about 1 hour ago

Llms

[2602.05523] Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations

Abstract page for arXiv paper 2602.05523: Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformat...

arXiv - AI · 4 min · about 1 hour ago

[2601.07160] AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units

About this article

Related Articles

C++ CuTe / CUTLASS vs CuTeDSL (Python) in 2026 — what should new GPU kernel / LLM inference engineers actually learn?[D]

[2511.10262] MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

[2602.07303] KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

[2602.05523] Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations

No comments

Stay updated with AI News