[2602.11287] HiFloat4 Format for Language Model Inference
Summary
The paper introduces HiFloat4, a block floating-point format designed for deep learning, enhancing efficiency in language model inference by improving accuracy and reducing power consumption.
Why It Matters
HiFloat4 represents a significant advancement in data formats for deep learning, addressing the growing need for efficient computation in language models. Its ability to enhance accuracy while minimizing resource usage is crucial for the future of AI applications, particularly as models become larger and more complex.
Key Takeaways
- HiFloat4 packs 64 4-bit elements with shared scaling metadata, optimizing data representation.
- The format improves matrix multiplication efficiency, reducing hardware area and power consumption.
- HiFloat4 outperforms the NVFP4 format in accuracy across multiple language models and tasks.
- The three-level scaling hierarchy allows for better dynamic range management.
- This innovation is essential for advancing the capabilities of large language models.
Computer Science > Machine Learning arXiv:2602.11287 (cs) [Submitted on 11 Feb 2026 (v1), last revised 13 Feb 2026 (this version, v2)] Title:HiFloat4 Format for Language Model Inference Authors:Yuanyong Luo, Jing Huang, Yu Cheng, Ziwei Yu, Kaihua Tang, Xinda Ma, Xin Wang, Anping Tong, Guipeng Hu, Yun Xu, Mehran Taghian, Peng Wu, Guanglin Li, Yunke Peng, Tianchi Hu, Minqi Chen, Michael Bi Mi, Hu Liu, Xiping Zhou, Junsong Wang, Qiang Lin, Heng Liao View a PDF of the paper titled HiFloat4 Format for Language Model Inference, by Yuanyong Luo and 21 other authors View PDF HTML (experimental) Abstract:This paper introduces HiFloat4 (HiF4), a block floating-point data format tailored for deep learning. Each HiF4 unit packs 64 4-bit elements with 32 bits of shared scaling metadata, averaging 4.5 bits per value. The metadata specifies a three-level scaling hierarchy, capturing inter- and intra-group dynamic range while improving the utilization of the representational space. In addition, the large 64-element group size enables matrix multiplications to be executed in a highly fixed-point manner, significantly reducing hardware area and power consumption. To evaluate the proposed format, we conducted inference experiments on several language models, including LLaMA, Qwen, Mistral, DeepSeek-V3.1 and LongCat. Results show that HiF4 achieves higher average accuracy than the state-of-the-art NVFP4 format across multiple models and diverse downstream tasks. Comments: Subjects: Machine Le...