[2603.10030] The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths
About this article
Abstract page for arXiv paper 2603.10030: The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths
Computer Science > Hardware Architecture arXiv:2603.10030 (cs) [Submitted on 26 Feb 2026 (v1), last revised 25 Mar 2026 (this version, v2)] Title:The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths Authors:Marco Graziano View a PDF of the paper titled The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths, by Marco Graziano View PDF HTML (experimental) Abstract:AI transport libraries move bytes efficiently, but they commonly assume that buffers are already correctly allocated, placed, shared, registered, and safe under completion and teardown pressure. This paper presents dmaplane, a Linux kernel module that makes this missing layer explicit as buffer orchestration. dmaplane exposes a stable kernel UAPI via /dev/dmaplane and composes ring-based command channels, DMA buffer lifecycle management, dma-buf export for cross-device sharing, a kernel-space RDMA engine, NUMA-aware allocation and verification, credit-based flow control, low-overhead observability, and GPU memory integration via PCIe BAR pinning. We evaluate orchestration sensitivity with measurements of NUMA cross-node penalties at DRAM scale, completion-safe flow control under sustained RDMA load, and GPU BAR mapping tiers versus cudaMemcpy. We also demonstrate end-to-end disaggregated inference by transferring KV-cache chunks between two machines using RDMA WRITE WITH IMMEDIATE and reconstructing tensor views on the receiver....