[2509.15429] Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data

[2509.15429] Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data

arXiv - Machine Learning 4 min read Research

Summary

This paper presents a Random Matrix Theory-guided approach to sparse PCA for single-cell RNA-seq data, enhancing dimensionality reduction and cell-type classification accuracy.

Why It Matters

Single-cell RNA-seq data is crucial for understanding cellular heterogeneity, but its analysis is complicated by noise. This research offers a novel method that improves data interpretation and classification, which can significantly impact biological research and clinical applications.

Key Takeaways

  • Introduces a biwhitening algorithm to estimate transcriptomic noise in single cells.
  • Utilizes Random Matrix Theory to enhance sparse PCA, making it nearly parameter-free.
  • Demonstrates improved performance over traditional PCA and other methods in cell-type classification tasks.

Computer Science > Machine Learning arXiv:2509.15429 (cs) [Submitted on 18 Sep 2025 (v1), last revised 26 Feb 2026 (this version, v2)] Title:Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data Authors:Victor Chardès View a PDF of the paper titled Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data, by Victor Chard\`es View PDF HTML (experimental) Abstract:Single-cell RNA-seq provides detailed molecular snapshots of individual cells but is notoriously noisy. Variability stems from biological differences and technical factors, such as amplification bias and limited RNA capture efficiency, making it challenging to adapt computational pipelines to heterogeneous datasets or evolving technologies. As a result, most studies still rely on principal component analysis (PCA) for dimensionality reduction, valued for its interpretability and robustness, in spite of its known bias in high dimensions. Here, we improve upon PCA with a Random Matrix Theory (RMT)-based approach that guides the inference of sparse principal components using existing sparse PCA algorithms. We first introduce a novel biwhitening algorithm which self-consistently estimates the magnitude of transcriptomic noise affecting each gene in individual cells, without assuming a specific noise distribution. This enables the use of an RMT-based criterion to automatically select the sparsity level, rendering sparse PCA nearly parameter-free. Our mathematically grounded approach retains the ...

Related Articles

Washington needs AI guardrails — now | Opinion
Ai Safety

Washington needs AI guardrails — now | Opinion

We need legislation that draws clear lines on what AI systems may and may not do on behalf of the United States government

AI Tools & Products · 3 min ·
[2601.12910] SciCoQA: Quality Assurance for Scientific Paper--Code Alignment
Ai Safety

[2601.12910] SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

Abstract page for arXiv paper 2601.12910: SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

arXiv - AI · 3 min ·
[2509.21385] Debugging Concept Bottleneck Models through Removal and Retraining
Machine Learning

[2509.21385] Debugging Concept Bottleneck Models through Removal and Retraining

Abstract page for arXiv paper 2509.21385: Debugging Concept Bottleneck Models through Removal and Retraining

arXiv - Machine Learning · 4 min ·
[2512.00804] Epistemic Bias Injection: Biasing LLMs via Selective Context Retrieval
Llms

[2512.00804] Epistemic Bias Injection: Biasing LLMs via Selective Context Retrieval

Abstract page for arXiv paper 2512.00804: Epistemic Bias Injection: Biasing LLMs via Selective Context Retrieval

arXiv - AI · 4 min ·
More in Ai Safety: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime