[2603.23566] AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization

[2603.23566] AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2603.23566: AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization

Computer Science > Machine Learning arXiv:2603.23566 (cs) [Submitted on 24 Mar 2026] Title:AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization Authors:Jiehao Wu, Zixiao Huang, Wenhao Li, Chuyun Shen, Junjie Sheng, Xiangfeng Wang View a PDF of the paper titled AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization, by Jiehao Wu and 5 other authors View PDF HTML (experimental) Abstract:AscendC (Ascend C) operator optimization on Huawei Ascend neural processing units (NPUs) faces a two-fold knowledge bottleneck: unlike the CUDA ecosystem, there are few public reference implementations to learn from, and performance hinges on a coupled two-part artifact - a host-side tiling program that orchestrates data movement and a kernel program that schedules and pipelines instructions. We present AscendOptimizer, an episodic agent that bootstraps this missing expertise by turning execution into experience. On the host side, AscendOptimizer performs profiling-in-the-loop evolutionary search to discover valid and high-performing tiling and data-movement configurations directly from hardware feedback. On the kernel side, it mines transferable optimization motifs by rewinding optimized kernels - systematically de-optimizing them to synthesize instructive "bad-to-good" trajectories - and distills these motifs into a retrievable experience bank for guided rewriting. By alternating host tuning and kernel rewriting in a closed loop, AscendOptimizer steadily expand...

Originally published on March 26, 2026. Curated by AI News.

Related Articles

Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

[R] First open-source implementation of Hebbian fast-weight write-back for the BDH architecture

The BDH (Dragon Hatchling) paper (arXiv:2509.26507) describes a Hebbian synaptic plasticity mechanism where model weights update during i...

Reddit - Machine Learning · 1 min ·
Ai Infrastructure

Persistent memory changes how people interact with AI — here's what I'm observing

I run a small AI companion platform and wanted to share some interesting behavioral data from users who've been using persistent cross-se...

Reddit - Artificial Intelligence · 1 min ·
More in Ai Infrastructure: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime