[2604.03298] ENEC: A Lossless AI Model Compression Method Enabling Fast Inference on Ascend NPUs
About this article
Abstract page for arXiv paper 2604.03298: ENEC: A Lossless AI Model Compression Method Enabling Fast Inference on Ascend NPUs
Computer Science > Hardware Architecture arXiv:2604.03298 (cs) [Submitted on 28 Mar 2026] Title:ENEC: A Lossless AI Model Compression Method Enabling Fast Inference on Ascend NPUs Authors:Jinwu Yang, Jiaan Wu, Zedong Liu, Xinyang Ma, Hairui Zhao, Yida Gu, Yuanhong Huang, Xingchen Liu, Wenjing Huang, Zheng Wei, Jing Xing, Yili Ma, Qingyi Zhang, Baoyi An, Zhongzhe Hu, Shaoteng Liu, Xia Zhu, Jiaxun Lu, Guangming Tan, Dingwen Tao View a PDF of the paper titled ENEC: A Lossless AI Model Compression Method Enabling Fast Inference on Ascend NPUs, by Jinwu Yang and 19 other authors View PDF HTML (experimental) Abstract:The rapid scaling of Large Language Models presents significant challenges for their deployment and inference, particularly on resource-constrained specialized AI hardware accelerators such as Huawei's Ascend NPUs, where weight data transfer has become a critical performance bottleneck. While lossless compression can preserve model accuracy and reduce data volume, existing lossless compression algorithms exhibit extremely low throughput when ported to the Ascend NPU architecture. In this paper, we propose ENEC, a novel lossless compression method specifically customized for AI model weights and optimized for Ascend Neural Processing Units. ENEC adopts a block-based fixed-length encoding scheme and incorporates a series of NPU-specific optimizations: bit-width quantization with hierarchical halving bit-packing, vectorized branch-free integer transformation, and depen...