[2604.08826] HiFloat4 Format for Language Model Pre-training on Ascend NPUs
About this article
Abstract page for arXiv paper 2604.08826: HiFloat4 Format for Language Model Pre-training on Ascend NPUs
Computer Science > Machine Learning arXiv:2604.08826 (cs) [Submitted on 9 Apr 2026] Title:HiFloat4 Format for Language Model Pre-training on Ascend NPUs Authors:Mehran Taghian, Yunke Peng, Xing Huang, Yao Wang, Yaoyuan Wang, Wei Guo, Yuanyong Luo, Tianchi Hu, Junsong Wang, Xin Wang, Hu Liu, Yu Cheng, Ziwei Yu, Hongliang Li, Mehdi Rahimifar, Lei Yan, Xuefei Wang, Zhuang Ma, Lei Liu, Hui Yu, Anandharaju Durai Raju, Hoang Le, Hei Yi Mak, Tanzila Rahman, Shadan Golestan View a PDF of the paper titled HiFloat4 Format for Language Model Pre-training on Ascend NPUs, by Mehran Taghian and 24 other authors View PDF HTML (experimental) Abstract:Large foundation models have become central to modern machine learning, with performance scaling predictably with model size and data. However, training and deploying such models incur substantial computational and memory costs, motivating the development of low-precision training techniques. Recent work has demonstrated that 4-bit floating-point (FP4) formats--such as MXFP4 and NVFP4--can be successfully applied to linear GEMM operations in large language models (LLMs), achieving up to 4x improvements in compute throughput and memory efficiency compared to higher-precision baselines. In this work, we investigate the recently proposed HiFloat4 FP4 format for Huawei Ascend NPUs and systematically compare it with MXFP4 in large-scale training settings. All experiments are conducted on Ascend NPU clusters, with linear and expert GEMM operations ...