[2601.07160] AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units
About this article
Abstract page for arXiv paper 2601.07160: AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units
Computer Science > Artificial Intelligence arXiv:2601.07160 (cs) [Submitted on 12 Jan 2026 (v1), last revised 17 Apr 2026 (this version, v2)] Title:AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units Authors:Xinzi Cao, Jianyang Zhai, Pengfei Li, Zhiheng Hu, Cen Yan, Bingxu Mu, Guanghuan Fang, Bin She, Jiayu Li, Yihan Su, Dongyang Tao, Xiansong Huang, Fan Xu, Feidiao Yang, Yao Lu, Chang-Dong Wang, Yutong Lu, Weicheng Xue, Bin Zhou, Yonghong Tian View a PDF of the paper titled AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units, by Xinzi Cao and 19 other authors View PDF HTML (experimental) Abstract:To meet the ever-increasing demand for computational efficiency, Neural Processing Units (NPUs) have become critical in modern AI infrastructure. However, unlocking their full potential requires developing high-performance compute kernels using vendor-specific Domain-Specific Languages (DSLs), a task that demands deep hardware expertise and is labor-intensive. While Large Language Models (LLMs) have shown promise in general code generation, they struggle with the strict constraints and scarcity of training data in the NPU domain. Our preliminary study reveals that state-of-the-art general-purpose LLMs fail to generate functional complex kernels for Ascend NPUs, yielding a near-zero success rate. To address these challenges, we propose AscendKernelGen, a generation-evaluation integrated framework...