[2602.15253] Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics

[2602.15253] Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics

arXiv - Machine Learning 4 min read Article

Summary

This article presents a study on the scaling laws of masked-reconstruction transformers applied to single-cell transcriptomics, revealing critical insights into model performance based on data availability.

Why It Matters

Understanding scaling laws in machine learning models, particularly in genomics, is crucial for optimizing model design and improving data utilization. This research highlights how data quantity impacts model capacity, informing future developments in single-cell analysis and AI applications in biology.

Key Takeaways

  • Scaling laws for transformers in single-cell transcriptomics are established.
  • Data-rich environments show significant power-law scaling, while data-limited settings do not.
  • The data-to-parameter ratio is a key factor influencing model performance.
  • The study provides a preliminary estimate of entropy per masked gene position.
  • Insights can guide the design of future single-cell foundation models.

Computer Science > Machine Learning arXiv:2602.15253 (cs) [Submitted on 16 Feb 2026] Title:Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics Authors:Ihor Kendiukhov View a PDF of the paper titled Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics, by Ihor Kendiukhov View PDF HTML (experimental) Abstract:Neural scaling laws -- power-law relationships between loss, model size, and data -- have been extensively documented for language and vision transformers, yet their existence in single-cell genomics remains largely unexplored. We present the first systematic study of scaling behaviour for masked-reconstruction transformers trained on single-cell RNA sequencing (scRNA-seq) data. Using expression profiles from the CELLxGENE Census, we construct two experimental regimes: a data-rich regime (512 highly variable genes, 200,000 cells) and a data-limited regime (1,024 genes, 10,000 cells). Across seven model sizes spanning three orders of magnitude in parameter count (533 to 3.4 x 10^8 parameters), we fit the parametric scaling law to validation mean squared error (MSE). The data-rich regime exhibits clear power-law scaling with an irreducible loss floor of c ~ 1.44, while the data-limited regime shows negligible scaling, indicating that model capacity is not the binding constraint when data are scarce. These results establish that scaling laws analogous to those observed in natural language processing do emerge in ...

Related Articles

Machine Learning

Artificial intelligence - Machine Learning, Robotics, Algorithms

AI Events ·
Machine Learning

Fed Chair Jerome Powell, Treasury's Bessent and top bank CEOs met over Anthropic's Mythos model

Reddit - Artificial Intelligence · 1 min ·
CoreWeave strikes a deal to power Anthropic's Claude AI models — and the stock surges 12%
Llms

CoreWeave strikes a deal to power Anthropic's Claude AI models — and the stock surges 12%

AI Tools & Products · 3 min ·
New AI model sparks alarm as governments brace for AI-driven cyberattacks
Machine Learning

New AI model sparks alarm as governments brace for AI-driven cyberattacks

AI Tools & Products · 6 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime