[2501.12032] Accelerating Recommender Model ETL with a Streaming FPGA-GPU Dataflow
Summary
The paper presents PipeRec, a hardware-accelerated ETL engine designed to enhance the efficiency of recommender model training by integrating FPGA and GPU technologies, achieving significant throughput improvements.
Why It Matters
As recommender systems increasingly rely on real-time data, optimizing the ETL process is crucial for maintaining performance and reducing costs. PipeRec addresses the bottleneck in data preprocessing, offering a solution that enhances GPU utilization and accelerates training times, which is vital for industries relying on rapid data integration.
Key Takeaways
- PipeRec accelerates ETL throughput by over 10x compared to CPU-based systems.
- The system maintains high GPU utilization (64-91%) during training.
- It reduces end-to-end training time to just 9.94% of traditional CPU-GPU pipelines.
- PipeRec's design integrates FPGA technology to optimize data flow.
- The approach addresses the growing need for efficient data processing in real-time recommender models.
Computer Science > Hardware Architecture arXiv:2501.12032 (cs) [Submitted on 21 Jan 2025 (v1), last revised 25 Feb 2026 (this version, v3)] Title:Accelerating Recommender Model ETL with a Streaming FPGA-GPU Dataflow Authors:Yu Zhu, Wenqi Jiang, Piyumi Jasin Pathiranage, Yongjun He, Gustavo Alonso View a PDF of the paper titled Accelerating Recommender Model ETL with a Streaming FPGA-GPU Dataflow, by Yu Zhu and 4 other authors View PDF Abstract:The real-time performance of recommender models depends on the continuous integration of massive volumes of new user interaction data into training pipelines. While GPUs have scaled model training throughput, the data preprocessing stage - commonly expressed as Extract-Transform-Load (ETL) pipelines - has emerged as the dominant bottleneck. Production systems often dedicate clusters of CPU servers to support a single GPU node, leading to high operational cost. To address this issue, we present PipeRec, a hardware-accelerated ETL engine co-designed with online recommender model training. PipeRec introduces a training-aware ETL abstraction that exposes freshness, ordering, and batching semantics while compiling software-defined operators into reconfigurable FPGA dataflows and overlaps ETL with GPU training to maximize utilization under I/O constraints. To eliminate CPU bottlenecks, PipeRec implements a format-aware packer that streams training-ready batches directly into GPU memory via P2P DMA transfers, enabling zero-copy ingest and e...