[2603.02731] Practical FP4 Training for Large-Scale MoE Models on

[2603.02731] Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs

arXiv - AI March 04, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.02731: Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs

Computer Science > Machine Learning arXiv:2603.02731 (cs) [Submitted on 3 Mar 2026] Title:Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs Authors:Wuyue Zhang, Chongdong Huang, Chunbo You, Cheng Gu, Fengjuan Wang, Mou Sun View a PDF of the paper titled Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs, by Wuyue Zhang and 5 other authors View PDF HTML (experimental) Abstract:Training large-scale Mixture-of-Experts (MoE) models is bottlenecked by activation memory and expert-parallel communication, yet FP4 training remains impractical on Hopper-class GPUs without native MXFP4 or NVFP4 support. In this work, we present a training recipe that enables MXFP4 efficiency for MoE models on Hopper architectures without native 4-bit computation support. A central challenge is to integrate FP4 into an existing BF16/FP8 hybrid training pipeline without incurring costly precision round-trips (e.g., FP4 $\leftrightarrow$ BF16 $\leftrightarrow$ FP8). We address this challenge by introducing direct FP8-to-FP4 quantization and de-quantization, together with scaling-aware FP4 row-wise to column-wise conversion, enabling FP4 activations and expert-parallel communication with minimal overhead. Core MoE computations are executed in FP8, while activations and expert-parallel communication are compressed using MXFP4, achieving substantial memory and bandwidth savings without degrading convergence. At the 671B parameter scale, our method achieves end-to-end training ...

Originally published on March 04, 2026. Curated by AI News.

Machine Learning

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch

Less than a year after launching, with checks from some of the biggest names in Silicon Valley, crowdsourced AI model feedback startup Yu...

TechCrunch - AI · 4 min · 27 minutes ago

Machine Learning

[R] Fine-tuning services report

If you have some data and want to train or run a small custom model but don't have powerful enough hardware for training, fine-tuning ser...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

[D] Does ML have a "bible"/reference textbook at the Intermediate/Advanced level?

Hello, everyone! This is my first time posting here and I apologise if the question is, perhaps, a bit too basic for this sub-reddit. A b...

Reddit - Machine Learning · 1 min · about 4 hours ago

Machine Learning

[D] ICML 2026 review policy debate: 100 responses suggest Policy B may score higher, while Policy A shows higher confidence

A week ago I made a thread asking whether ICML 2026’s review policy might have affected review outcomes, especially whether Policy A pape...

Reddit - Machine Learning · 1 min · about 4 hours ago

[2603.02731] Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs

About this article

Related Articles

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch

[R] Fine-tuning services report

[D] Does ML have a "bible"/reference textbook at the Intermediate/Advanced level?

[D] ICML 2026 review policy debate: 100 responses suggest Policy B may score higher, while Policy A shows higher confidence

No comments

Stay updated with AI News