[2602.22434] GetBatch: Distributed Multi-Object Retrieval for ML Data Loading
Summary
GetBatch introduces a new object store API that enhances batch retrieval in machine learning data loading, achieving significant performance improvements over traditional methods.
Why It Matters
As machine learning models become increasingly data-intensive, efficient data retrieval is crucial for optimizing training processes. GetBatch addresses the overhead of multiple GET requests, streamlining data loading and potentially accelerating model training times, which is vital for researchers and practitioners in AI.
Key Takeaways
- GetBatch replaces multiple GET requests with a single batch retrieval operation.
- Achieves up to 15x throughput improvement for small objects.
- Reduces P95 batch retrieval latency by 2x and P99 per-object tail latency by 3.7x.
- Enhances efficiency in machine learning training pipelines.
- Offers a fault-tolerant streaming execution model.
Computer Science > Distributed, Parallel, and Cluster Computing arXiv:2602.22434 (cs) [Submitted on 25 Feb 2026] Title:GetBatch: Distributed Multi-Object Retrieval for ML Data Loading Authors:Alex Aizman, Abhishek Gaikwad, Piotr Żelasko View a PDF of the paper titled GetBatch: Distributed Multi-Object Retrieval for ML Data Loading, by Alex Aizman and 2 other authors View PDF HTML (experimental) Abstract:Machine learning training pipelines consume data in batches. A single training step may require thousands of samples drawn from shards distributed across a storage cluster. Issuing thousands of individual GET requests incurs per-request overhead that often dominates data transfer time. To solve this problem, we introduce GetBatch - a new object store API that elevates batch retrieval to a first-class storage operation, replacing independent GET operations with a single deterministic, fault-tolerant streaming execution. GetBatch achieves up to 15x throughput improvement for small objects and, in a production training workload, reduces P95 batch retrieval latency by 2x and P99 per-object tail latency by 3.7x compared to individual GET requests. Comments: Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (cs.LG) Cite as: arXiv:2602.22434 [cs.DC] (or arXiv:2602.22434v1 [cs.DC] for this version) https://doi.org/10.48550/arXiv.2602.22434 Focus to learn more arXiv-issued DOI via DataCite (pendin...