[2604.02651] Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training
About this article
Abstract page for arXiv paper 2604.02651: Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training
Computer Science > Machine Learning arXiv:2604.02651 (cs) [Submitted on 3 Apr 2026] Title:Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training Authors:Cunyang Wei, Siddharth Singh, Aishwarya Sarkar, Daniel Nichols, Tisha Patel, Aditya K. Ranjan, Sayan Ghosh, Ali Jannesari, Nathan R. Tallent, Abhinav Bhatele View a PDF of the paper titled Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training, by Cunyang Wei and 9 other authors View PDF HTML (experimental) Abstract:Graph neural networks (GNNs) are widely used for learning on graph datasets derived from various real-world scenarios. Learning from extremely large graphs requires distributed training, and mini-batching with sampling is a popular approach for parallelizing GNN training. Existing distributed mini-batch approaches have significant performance bottlenecks due to expensive sampling methods and limited scaling when using data parallelism. In this work, we present ScaleGNN, a 4D parallel framework for scalable mini-batch GNN training that combines communication-free distributed sampling, 3D parallel matrix multiplication (PMM), and data parallelism. ScaleGNN introduces a uniform vertex sampling algorithm, enabling each process (GPU device) to construct its local mini-batch, i.e., subgraph partitions without any inter-process communication. 3D PMM enables scaling mini-batch training to much larger GPU counts than vanilla data parallelism with si...