[2407.15264] LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training

[2407.15264] LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme

arXiv - Machine Learning March 31, 2026 4 min read

About this article

Abstract page for arXiv paper 2407.15264: LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme

Computer Science > Distributed, Parallel, and Cluster Computing arXiv:2407.15264 (cs) [Submitted on 21 Jul 2024 (v1), last revised 28 Mar 2026 (this version, v2)] Title:LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme Authors:Jeongmin Brian Park, Kun Wu, Vikram Sharma Mailthody, Zaid Quresh, Scott Mahlke, Wen-mei Hwu View a PDF of the paper titled LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme, by Jeongmin Brian Park and 5 other authors View PDF HTML (experimental) Abstract:Graph Neural Networks (GNNs) are widely used today in recommendation systems, fraud detection, and node/link classification tasks. Real world GNNs continue to scale in size and require a large memory footprint for storing graphs and embeddings that often exceed the memory capacities of the target GPUs used for training. To address limited memory capacities, traditional GNN training approaches use graph partitioning and sharding techniques to scale up across multiple GPUs within a node and/or scale out across multiple nodes. However, this approach suffers from the high computational costs of graph partitioning algorithms and inefficient communication across GPUs. To address these overheads, we propose Large-scale Storage-based Multi-GPU GNN framework (LSM-GNN), a storage-based approach to train GNN models that utilizes a novel communication layer enabling GPU software caches to function as a system-wide shared cache...

Originally published on March 31, 2026. Curated by AI News.

Machine Learning

Week 6 AIPass update - answering the top questions from last post (file conflicts, remote models, scale)

Followup to last post with answers to the top questions from the comments. Appreciate everyone who jumped in. The most common one by a mi...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Honest ChatGPT vs Claude comparison after using both daily for a month

got tired of reading comparisons that were obvisously written by people who tested each tool for 20 minutes so i ran both at $20/month fo...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Machine Learning

What if attention didn’t need matrix multiplication?

I built a cognitive architecture where all computation reduces to three bit operations: XOR, MAJ, POPCNT. No GEMM. No GPU. No floating-po...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Machine Learning

WTF. Its real. AllBirds (the shoe company) is pivoting to inference.

I'm profoundly ambivalent re: how to feel about this; is it great -- what a scrappy, bold pivot! Or wildly dumb - its so far from their c...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

[2407.15264] LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme

About this article

Related Articles

Week 6 AIPass update - answering the top questions from last post (file conflicts, remote models, scale)

Honest ChatGPT vs Claude comparison after using both daily for a month

What if attention didn’t need matrix multiplication?

WTF. Its real. AllBirds (the shoe company) is pivoting to inference.

No comments

Stay updated with AI News