[2602.16696] Parameter-free representations outperform single-cell foundation models on downstream benchmarks
Summary
This paper demonstrates that parameter-free representations can outperform single-cell foundation models in various benchmarks, suggesting simpler methods may capture biological data effectively.
Why It Matters
The findings challenge the reliance on complex deep learning models in genomics, advocating for simpler, interpretable methods that can achieve state-of-the-art performance. This has implications for efficiency and accessibility in biological research, potentially democratizing advanced analysis techniques.
Key Takeaways
- Parameter-free methods can achieve state-of-the-art performance in single-cell RNA sequencing tasks.
- Simple linear representations may effectively capture the biology of cell identity.
- The study highlights the importance of rigorous benchmarking in evaluating model performance.
- Outperforming complex models on out-of-distribution tasks suggests robustness in simpler approaches.
- The findings could lead to more accessible methods for researchers in genomics.
Quantitative Biology > Genomics arXiv:2602.16696 (q-bio) [Submitted on 18 Feb 2026] Title:Parameter-free representations outperform single-cell foundation models on downstream benchmarks Authors:Huan Souza, Pankaj Mehta View a PDF of the paper titled Parameter-free representations outperform single-cell foundation models on downstream benchmarks, by Huan Souza and 1 other authors View PDF HTML (experimental) Abstract:Single-cell RNA sequencing (scRNA-seq) data exhibit strong and reproducible statistical structure. This has motivated the development of large-scale foundation models, such as TranscriptFormer, that use transformer-based architectures to learn a generative model for gene expression by embedding genes into a latent vector space. These embeddings have been used to obtain state-of-the-art (SOTA) performance on downstream tasks such as cell-type classification, disease-state prediction, and cross-species learning. Here, we ask whether similar performance can be achieved without utilizing computationally intensive deep learning-based representations. Using simple, interpretable pipelines that rely on careful normalization and linear methods, we obtain SOTA or near SOTA performance across multiple benchmarks commonly used to evaluate single-cell foundation models, including outperforming foundation models on out-of-distribution tasks involving novel cell types and organisms absent from the training data. Our findings highlight the need for rigorous benchmarking and ...