[2602.17115] Semi-Supervised Learning on Graphs using Graph Neural Networks
Summary
The paper explores the effectiveness of Graph Neural Networks (GNNs) in semi-supervised learning, providing theoretical insights and empirical validation of their performance in node regression tasks.
Why It Matters
Understanding the theoretical underpinnings of GNNs is crucial for advancing machine learning applications in graph-based data. This research addresses a significant gap in the literature, offering a framework that can enhance the deployment of GNNs in real-world scenarios where labeled data is scarce.
Key Takeaways
- GNNs excel in semi-supervised node regression tasks.
- The paper provides a non-asymptotic risk bound for GNN performance.
- Theoretical insights clarify how performance scales with labeled data.
- Empirical results support the theoretical framework.
- The research contributes to understanding GNN limitations and capabilities.
Statistics > Machine Learning arXiv:2602.17115 (stat) [Submitted on 19 Feb 2026] Title:Semi-Supervised Learning on Graphs using Graph Neural Networks Authors:Juntong Chen, Claire Donnat, Olga Klopp, Johannes Schmidt-Hieber View a PDF of the paper titled Semi-Supervised Learning on Graphs using Graph Neural Networks, by Juntong Chen and 3 other authors View PDF HTML (experimental) Abstract:Graph neural networks (GNNs) work remarkably well in semi-supervised node regression, yet a rigorous theory explaining when and why they succeed remains lacking. To address this gap, we study an aggregate-and-readout model that encompasses several common message passing architectures: node features are first propagated over the graph then mapped to responses via a nonlinear function. For least-squares estimation over GNNs with linear graph convolutions and a deep ReLU readout, we prove a sharp non-asymptotic risk bound that separates approximation, stochastic, and optimization errors. The bound makes explicit how performance scales with the fraction of labeled nodes and graph-induced dependence. Approximation guarantees are further derived for graph-smoothing followed by smooth nonlinear readouts, yielding convergence rates that recover classical nonparametric behavior under full supervision while characterizing performance when labels are scarce. Numerical experiments validate our theory, providing a systematic framework for understanding GNN performance and limitations. Comments: Subjec...