[2506.14861] BMFM-RNA: whole-cell expression decoding improves transcriptomic foundation models
About this article
Abstract page for arXiv paper 2506.14861: BMFM-RNA: whole-cell expression decoding improves transcriptomic foundation models
Quantitative Biology > Genomics arXiv:2506.14861 (q-bio) [Submitted on 17 Jun 2025 (v1), last revised 26 Mar 2026 (this version, v2)] Title:BMFM-RNA: whole-cell expression decoding improves transcriptomic foundation models Authors:Michael M. Danziger, Bharath Dandala, Viatcheslav Gurev, Matthew Madgwick, Sivan Ravid, Tim Rumbell, Akira Koseki, Tal Kozlovski, Ching-Huei Tsou, Ella Barkan, Tanwi Biswas, Jielin Xu, Yishai Shimoni, Jianying Hu, Michal Rosen-Zvi View a PDF of the paper titled BMFM-RNA: whole-cell expression decoding improves transcriptomic foundation models, by Michael M. Danziger and 14 other authors View PDF HTML (experimental) Abstract:Transcriptomic foundation models pretrained with masked language modeling can achieve low pretraining loss yet produce poor cell representations for downstream tasks. We introduce whole-cell expression decoding (WCED), where models reconstruct the entire gene vocabulary from a single CLS token embedding, even with limited inputs, creating a maximally informative bottleneck. WCED consistently outperforms MLM on all downstream metrics despite higher reconstruction error during training. Gene-level error tracking reveals that both methods preferentially learn genes whose expression co-varies with stable transcriptional programs rather than those driven by transient factors. We further add hierarchical cross-entropy loss that exploits Cell Ontology structure for zero-shot annotation at multiple granularity levels. Models trained w...