[2505.00940] StablePCA: Distributionally Robust Learning of Representations from Multi-Source Data
About this article
Abstract page for arXiv paper 2505.00940: StablePCA: Distributionally Robust Learning of Representations from Multi-Source Data
Computer Science > Machine Learning arXiv:2505.00940 (cs) [Submitted on 2 May 2025 (v1), last revised 3 Mar 2026 (this version, v2)] Title:StablePCA: Distributionally Robust Learning of Representations from Multi-Source Data Authors:Zhenyu Wang, Molei Liu, Jing Lei, Francis Bach, Zijian Guo View a PDF of the paper titled StablePCA: Distributionally Robust Learning of Representations from Multi-Source Data, by Zhenyu Wang and 4 other authors View PDF HTML (experimental) Abstract:When synthesizing multi-source high-dimensional data, a key objective is to extract low-dimensional representations that effectively approximate the original features across different sources. Such representations facilitate the discovery of transferable structures and help mitigate systematic biases such as batch effects. We introduce Stable Principal Component Analysis (StablePCA), a distributionally robust framework for constructing stable latent representations by maximizing the worst-case explained variance over multiple sources. A primary challenge in extending classical PCA to the multi-source setting lies in the nonconvex rank constraint, which renders the StablePCA formulation a nonconvex optimization problem. To overcome this challenge, we conduct a convex relaxation of StablePCA and develop an efficient Mirror-Prox algorithm to solve the relaxed problem, with global convergence guarantees. Since the relaxed problem generally differs from the original formulation, we further introduce a da...