[2604.01949] annbatch unlocks terabyte-scale training of biological data in anndata

[2604.01949] annbatch unlocks terabyte-scale training of biological data in anndata

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2604.01949: annbatch unlocks terabyte-scale training of biological data in anndata

Computer Science > Machine Learning arXiv:2604.01949 (cs) [Submitted on 2 Apr 2026] Title:annbatch unlocks terabyte-scale training of biological data in anndata Authors:Ilan Gold, Felix Fischer, Lucas Arnoldt, F. Alexander Wolf, Fabian J. Theis View a PDF of the paper titled annbatch unlocks terabyte-scale training of biological data in anndata, by Ilan Gold and 4 other authors View PDF Abstract:The scale of biological datasets now routinely exceeds system memory, making data access rather than model computation the primary bottleneck in training machine-learning models. This bottleneck is particularly acute in biology, where widely used community data formats must support heterogeneous metadata, sparse and dense assays, and downstream analysis within established computational ecosystems. Here we present annbatch, a mini-batch loader native to anndata that enables out-of-core training directly on disk-backed datasets. Across single-cell transcriptomics, microscopy and whole-genome sequencing benchmarks, annbatch increases loading throughput by up to an order of magnitude and shortens training from days to hours, while remaining fully compatible with the scverse ecosystem. Annbatch establishes a practical data-loading infrastructure for scalable biological AI, allowing increasingly large and diverse datasets to be used without abandoning standard biological data formats. Github: this https URL Subjects: Machine Learning (cs.LG); Genomics (q-bio.GN) Cite as: arXiv:2604.01949...

Originally published on April 03, 2026. Curated by AI News.

Related Articles

Machine learning analysis of CT scans
Machine Learning

Machine learning analysis of CT scans

An AI-powered tool can interpret 3D images from CT scans and diagnose certain disorders.

AI News - General · 5 min ·
Teaching AI models to say “I’m not sure”
Machine Learning

Teaching AI models to say “I’m not sure”

MIT CSAIL's “Reinforcement Learning with Calibration Rewards” technique improves AI confidence estimates without sacrificing perform...

AI News - General · 7 min ·
Accelerating science with AI and simulations
Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min ·
A Machine Learning Engineer Thought He Was Safe From AI Layoffs. Then He Got Some Depressing News
Machine Learning

A Machine Learning Engineer Thought He Was Safe From AI Layoffs. Then He Got Some Depressing News

AI News - General · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime