[2602.15253] Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics
Summary
This article presents a study on the scaling laws of masked-reconstruction transformers applied to single-cell transcriptomics, revealing critical insights into model performance based on data availability.
Why It Matters
Understanding scaling laws in machine learning models, particularly in genomics, is crucial for optimizing model design and improving data utilization. This research highlights how data quantity impacts model capacity, informing future developments in single-cell analysis and AI applications in biology.
Key Takeaways
- Scaling laws for transformers in single-cell transcriptomics are established.
- Data-rich environments show significant power-law scaling, while data-limited settings do not.
- The data-to-parameter ratio is a key factor influencing model performance.
- The study provides a preliminary estimate of entropy per masked gene position.
- Insights can guide the design of future single-cell foundation models.
Computer Science > Machine Learning arXiv:2602.15253 (cs) [Submitted on 16 Feb 2026] Title:Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics Authors:Ihor Kendiukhov View a PDF of the paper titled Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics, by Ihor Kendiukhov View PDF HTML (experimental) Abstract:Neural scaling laws -- power-law relationships between loss, model size, and data -- have been extensively documented for language and vision transformers, yet their existence in single-cell genomics remains largely unexplored. We present the first systematic study of scaling behaviour for masked-reconstruction transformers trained on single-cell RNA sequencing (scRNA-seq) data. Using expression profiles from the CELLxGENE Census, we construct two experimental regimes: a data-rich regime (512 highly variable genes, 200,000 cells) and a data-limited regime (1,024 genes, 10,000 cells). Across seven model sizes spanning three orders of magnitude in parameter count (533 to 3.4 x 10^8 parameters), we fit the parametric scaling law to validation mean squared error (MSE). The data-rich regime exhibits clear power-law scaling with an irreducible loss floor of c ~ 1.44, while the data-limited regime shows negligible scaling, indicating that model capacity is not the binding constraint when data are scarce. These results establish that scaling laws analogous to those observed in natural language processing do emerge in ...