[2405.12317] Kernel spectral joint embeddings for high-dimensional

[2405.12317] Kernel spectral joint embeddings for high-dimensional noisy datasets using duo-landmark integral operators

arXiv - Machine Learning March 02, 2026 4 min read

About this article

Abstract page for arXiv paper 2405.12317: Kernel spectral joint embeddings for high-dimensional noisy datasets using duo-landmark integral operators

Statistics > Machine Learning arXiv:2405.12317 (stat) [Submitted on 20 May 2024 (v1), last revised 27 Feb 2026 (this version, v3)] Title:Kernel spectral joint embeddings for high-dimensional noisy datasets using duo-landmark integral operators Authors:Xiucai Ding, Rong Ma View a PDF of the paper titled Kernel spectral joint embeddings for high-dimensional noisy datasets using duo-landmark integral operators, by Xiucai Ding and Rong Ma View PDF HTML (experimental) Abstract:Integrative analysis of multiple heterogeneous datasets has become standard practice in many research fields, especially in single-cell genomics and medical informatics. Existing approaches oftentimes suffer from limited power in capturing nonlinear structures, insufficient account of noisiness and effects of high-dimensionality, lack of adaptivity to signals and sample sizes imbalance, and their results are sometimes difficult to interpret. To address these limitations, we propose a novel kernel spectral method that achieves joint embeddings of two independently observed high-dimensional noisy datasets. The proposed method automatically captures and leverages possibly shared low-dimensional structures across datasets to enhance embedding quality. The obtained low-dimensional embeddings can be utilized for many downstream tasks such as simultaneous clustering, data visualization, and denoising. The proposed method is justified by rigorous theoretical analysis. Specifically, we show the consistency of our ...

Originally published on March 02, 2026. Curated by AI News.

Machine Learning

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

We built an open-source prototype that applies Unix philosophy to retrieval pipelines. Each stage (PII redaction, chunking, dedup, embedd...

Reddit - Machine Learning · 1 min · about 1 hour ago

Nlp

[P] Using YouTube as a data source (lessons from building a coffee domain dataset)

I started working on a small coffee coaching app recently - something that could answer questions around brew methods, grind size, extrac...

Reddit - Machine Learning · 1 min · about 4 hours ago

Llms

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

Abstract page for arXiv paper 2601.13227: Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

arXiv - AI · 3 min · about 13 hours ago

Llms

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

Abstract page for arXiv paper 2601.22440: AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Value...

arXiv - AI · 4 min · about 13 hours ago

[2405.12317] Kernel spectral joint embeddings for high-dimensional noisy datasets using duo-landmark integral operators

About this article

Related Articles

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

[P] Using YouTube as a data source (lessons from building a coffee domain dataset)

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

No comments

Stay updated with AI News