[2512.01678] Morphling: Fast, Fused, and Flexible GNN Training at Scale
About this article
Abstract page for arXiv paper 2512.01678: Morphling: Fast, Fused, and Flexible GNN Training at Scale
Computer Science > Machine Learning arXiv:2512.01678 (cs) This paper has been withdrawn by Anubhab Anubhab [Submitted on 1 Dec 2025 (v1), last revised 26 Mar 2026 (this version, v4)] Title:Morphling: Fast, Fused, and Flexible GNN Training at Scale Authors:Anubhab, Rupesh Nasre View a PDF of the paper titled Morphling: Fast, Fused, and Flexible GNN Training at Scale, by Anubhab and Rupesh Nasre No PDF available, click to view other formats Abstract:Graph Neural Networks (GNNs) present a fundamental hardware challenge by fusing irregular, memory-bound graph traversals with regular, compute-intensive dense matrix operations. While frameworks such as PyTorch Geometric (PyG) and Deep Graph Library (DGL) prioritize high-level usability, they fail to address these divergent execution characteristics. As a result, they rely on generic kernels that suffer from poor cache locality, excessive memory movement, and substantial intermediate allocations. To address these limitations, we present Morphling, a domain-specific code synthesizer designed to bridge this gap. Morphling compiles high-level GNN specifications into portable, backend-specialized implementations targeting OpenMP, CUDA, and MPI. It achieves this by instantiating a library of optimized, architecture-aware primitives tailored to each execution environment. Morphling also incorporates a runtime sparsity-aware execution engine that dynamically selects dense or sparse execution paths using input feature statistics, reducin...