[2602.22434] GetBatch: Distributed Multi-Object Retrieval for ML Data Loading

[2602.22434] GetBatch: Distributed Multi-Object Retrieval for ML Data Loading

arXiv - Machine Learning 3 min read Article

Summary

GetBatch introduces a new object store API that enhances batch retrieval in machine learning data loading, achieving significant performance improvements over traditional methods.

Why It Matters

As machine learning models become increasingly data-intensive, efficient data retrieval is crucial for optimizing training processes. GetBatch addresses the overhead of multiple GET requests, streamlining data loading and potentially accelerating model training times, which is vital for researchers and practitioners in AI.

Key Takeaways

  • GetBatch replaces multiple GET requests with a single batch retrieval operation.
  • Achieves up to 15x throughput improvement for small objects.
  • Reduces P95 batch retrieval latency by 2x and P99 per-object tail latency by 3.7x.
  • Enhances efficiency in machine learning training pipelines.
  • Offers a fault-tolerant streaming execution model.

Computer Science > Distributed, Parallel, and Cluster Computing arXiv:2602.22434 (cs) [Submitted on 25 Feb 2026] Title:GetBatch: Distributed Multi-Object Retrieval for ML Data Loading Authors:Alex Aizman, Abhishek Gaikwad, Piotr Żelasko View a PDF of the paper titled GetBatch: Distributed Multi-Object Retrieval for ML Data Loading, by Alex Aizman and 2 other authors View PDF HTML (experimental) Abstract:Machine learning training pipelines consume data in batches. A single training step may require thousands of samples drawn from shards distributed across a storage cluster. Issuing thousands of individual GET requests incurs per-request overhead that often dominates data transfer time. To solve this problem, we introduce GetBatch - a new object store API that elevates batch retrieval to a first-class storage operation, replacing independent GET operations with a single deterministic, fault-tolerant streaming execution. GetBatch achieves up to 15x throughput improvement for small objects and, in a production training workload, reduces P95 batch retrieval latency by 2x and P99 per-object tail latency by 3.7x compared to individual GET requests. Comments: Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (cs.LG) Cite as: arXiv:2602.22434 [cs.DC]   (or arXiv:2602.22434v1 [cs.DC] for this version)   https://doi.org/10.48550/arXiv.2602.22434 Focus to learn more arXiv-issued DOI via DataCite (pendin...

Related Articles

Machine Learning

I tried building a memory-first AI… and ended up discovering smaller models can beat larger ones

Dataset Model Acc F1 Δ vs Log Δ vs Static Avg Params Peak Params Steps Infer ms Size Banking77-20 Logistic TF-IDF 92.37% 0.9230 +0.00pp +...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] Howcome Muon is only being used for Transformers?

Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets tu...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] Run Karpathy's Autoresearch for $0.44 instead of $24 — Open-source parallel evolution pipeline on SageMaker Spot

TL;DR: I built an open-source pipeline that runs Karpathy's autoresearch on SageMaker Spot instances — 25 autonomous ML experiments for $...

Reddit - Machine Learning · 1 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime