[2602.12635] Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats

[2602.12635] Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats

arXiv - Machine Learning 3 min read Article

Summary

This article evaluates HiFloat formats for low-bit inference on Ascend NPUs, highlighting their efficiency and compatibility with state-of-the-art quantization frameworks.

Why It Matters

As large language models (LLMs) continue to grow, optimizing their performance through low-bit floating-point formats is crucial. This research provides insights into how HiFloat formats can enhance inference efficiency, which is significant for developers and researchers working on AI and machine learning applications.

Key Takeaways

  • INT8 is effective for narrow-range data, while HiFloat formats excel with high-variance data.
  • HiF4's hierarchical scaling prevents accuracy collapse in 4-bit regimes.
  • HiFloat formats are compatible with leading post-training quantization frameworks.
  • Low-bit inference can significantly enhance efficiency in LLM applications.
  • The evaluation provides a pathway for optimizing AI model performance on NPUs.

Computer Science > Computation and Language arXiv:2602.12635 (cs) [Submitted on 13 Feb 2026] Title:Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats Authors:Pengxiang Zhao, Hui-Ling Zhen, Xing Li, Han Bao, Weizhe Lin, Zhiyuan Yang, Ziwei Yu, Xin Wang, Mingxuan Yuan, Xianzhi Yu, Zhenhua Dong View a PDF of the paper titled Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats, by Pengxiang Zhao and 10 other authors View PDF HTML (experimental) Abstract:As LLMs scale, low-bit floating-point formats like MXFP and NVFP4 offer new opportunities for precision and efficiency. In this work, we evaluate HiFloat (HiF8 and HiF4), a family of formats tailored for Ascend NPUs. Through rigorous comparison across weight-activation and KV-cache tasks, we provide three key insights: (1) INT8 suits narrow-range data, while floating-point formats excel with high-variance data; (2) in 4-bit regimes, HiF4's hierarchical scaling prevents the accuracy collapse seen in integer formats; and (3) HiFloat is fully compatible with state-of-the-art post-training quantization frameworks. Overall, HiFloat provides a solution for high-efficiency LLM inference on NPUs. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2602.12635 [cs.CL]   (or arXiv:2602.12635v1 [cs.CL] for this version)   https://doi.org/10.48550/arXiv.2602.12635 Focus to learn more arXiv-issued DO...

Related Articles

Llms

OpenClaw security checklist: practical safeguards for AI agents

Here is one of the better quality guides on the ensuring safety when deploying OpenClaw: https://chatgptguide.ai/openclaw-security-checkl...

Reddit - Artificial Intelligence · 1 min ·
I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge
Llms

I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge

Gemini in Google Maps is a surprisingly useful way to explore new territory.

The Verge - AI · 11 min ·
Llms

The person who replaces you probably won't be AI. It'll be someone from the next department over who learned to use it - opinion/discussion

I'm a strategy person by background. Two years ago I'd write a recommendation and hand it to a product team. Now.. I describe what I want...

Reddit - Artificial Intelligence · 1 min ·
Block Resets Management With AI As Cash App Adds Installment Transfers
Llms

Block Resets Management With AI As Cash App Adds Installment Transfers

Block (NYSE:XYZ) plans a permanent organizational overhaul that replaces many middle management roles with AI-driven models to create fla...

AI Tools & Products · 5 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime