Machine Learning Ai Infrastructure Data Science

[2510.07182] Bridged Clustering: Semi-Supervised Sparse Bridging

arXiv - Machine Learning February 17, 2026 3 min read Article

Summary

The paper introduces Bridged Clustering, a semi-supervised framework that learns predictors from unpaired datasets by clustering inputs and outputs independently, then bridging them using sparse examples.

Why It Matters

Bridged Clustering presents a novel approach to semi-supervised learning by effectively utilizing unpaired data, making it relevant for applications where labeled data is scarce. Its efficiency and simplicity could enhance predictive modeling in various machine learning tasks.

Key Takeaways

Bridged Clustering leverages unpaired input-output datasets for predictions.
The method maintains sparsity and interpretability in the bridging process.
Empirical results show competitiveness with state-of-the-art methods.
The framework is model-agnostic and efficient in low-supervision scenarios.
Theoretical analysis supports the effectiveness of the algorithm.

Computer Science > Machine Learning arXiv:2510.07182 (cs) [Submitted on 8 Oct 2025 (v1), last revised 13 Feb 2026 (this version, v3)] Title:Bridged Clustering: Semi-Supervised Sparse Bridging Authors:Patrick Peixuan Ye, Chen Shani, Ellen Vitercik View a PDF of the paper titled Bridged Clustering: Semi-Supervised Sparse Bridging, by Patrick Peixuan Ye and 2 other authors View PDF HTML (experimental) Abstract:We introduce Bridged Clustering, a semi-supervised framework to learn predictors from any unpaired input $X$ and output $Y$ dataset. Our method first clusters $X$ and $Y$ independently, then learns a sparse, interpretable bridge between clusters using only a few paired examples. At inference, a new input $x$ is assigned to its nearest input cluster, and the centroid of the linked output cluster is returned as the prediction $\hat{y}$. Unlike traditional SSL, Bridged Clustering explicitly leverages output-only data, and unlike dense transport-based methods, it maintains a sparse and interpretable alignment. Through theoretical analysis, we show that with bounded mis-clustering and mis-bridging rates, our algorithm becomes an effective and efficient predictor. Empirically, our method is competitive with SOTA methods while remaining simple, model-agnostic, and highly label-efficient in low-supervision settings. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2510.07182 [cs.LG] (or arXiv:2510.07182v3 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2510.07182 ...

Read Original Article