[2602.11287] HiFloat4 Format for Language Model Inference

[2602.11287] HiFloat4 Format for Language Model Inference

arXiv - Machine Learning 3 min read Article

Summary

The paper introduces HiFloat4, a block floating-point format designed for deep learning, enhancing efficiency in language model inference by improving accuracy and reducing power consumption.

Why It Matters

HiFloat4 represents a significant advancement in data formats for deep learning, addressing the growing need for efficient computation in language models. Its ability to enhance accuracy while minimizing resource usage is crucial for the future of AI applications, particularly as models become larger and more complex.

Key Takeaways

  • HiFloat4 packs 64 4-bit elements with shared scaling metadata, optimizing data representation.
  • The format improves matrix multiplication efficiency, reducing hardware area and power consumption.
  • HiFloat4 outperforms the NVFP4 format in accuracy across multiple language models and tasks.
  • The three-level scaling hierarchy allows for better dynamic range management.
  • This innovation is essential for advancing the capabilities of large language models.

Computer Science > Machine Learning arXiv:2602.11287 (cs) [Submitted on 11 Feb 2026 (v1), last revised 13 Feb 2026 (this version, v2)] Title:HiFloat4 Format for Language Model Inference Authors:Yuanyong Luo, Jing Huang, Yu Cheng, Ziwei Yu, Kaihua Tang, Xinda Ma, Xin Wang, Anping Tong, Guipeng Hu, Yun Xu, Mehran Taghian, Peng Wu, Guanglin Li, Yunke Peng, Tianchi Hu, Minqi Chen, Michael Bi Mi, Hu Liu, Xiping Zhou, Junsong Wang, Qiang Lin, Heng Liao View a PDF of the paper titled HiFloat4 Format for Language Model Inference, by Yuanyong Luo and 21 other authors View PDF HTML (experimental) Abstract:This paper introduces HiFloat4 (HiF4), a block floating-point data format tailored for deep learning. Each HiF4 unit packs 64 4-bit elements with 32 bits of shared scaling metadata, averaging 4.5 bits per value. The metadata specifies a three-level scaling hierarchy, capturing inter- and intra-group dynamic range while improving the utilization of the representational space. In addition, the large 64-element group size enables matrix multiplications to be executed in a highly fixed-point manner, significantly reducing hardware area and power consumption. To evaluate the proposed format, we conducted inference experiments on several language models, including LLaMA, Qwen, Mistral, DeepSeek-V3.1 and LongCat. Results show that HiF4 achieves higher average accuracy than the state-of-the-art NVFP4 format across multiple models and diverse downstream tasks. Comments: Subjects: Machine Le...

Related Articles

Llms

Attention Is All You Need, But All You Can't Afford | Hybrid Attention

Repo: https://codeberg.org/JohannaJuntos/Sisyphus I've been building a small Rust-focused language model from scratch in PyTorch. Not a f...

Reddit - Artificial Intelligence · 1 min ·
The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?
Llms

The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?

AI Tools & Products · 12 min ·
Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute
Llms

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

AI Tools & Products · 3 min ·
How I use Claude for strategy, Gemini for research and ChatGPT for 'the grind'
Llms

How I use Claude for strategy, Gemini for research and ChatGPT for 'the grind'

AI Tools & Products · 9 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime