[2602.19591] Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks
Summary
This article presents SME-HGT, a Heterogeneous Graph Transformer framework designed to identify high-potential small and medium enterprises (SMEs) using public data, outperforming traditional models in predictive accuracy.
Why It Matters
Identifying high-potential SMEs is crucial for economic growth, as they represent a significant portion of businesses. This research offers a novel approach using graph neural networks, which could enhance funding decisions for policymakers and investors.
Key Takeaways
- SME-HGT framework predicts which SMEs will advance in funding phases.
- Utilizes a heterogeneous graph comprising company, research, and agency nodes.
- Achieves higher precision and predictive accuracy than traditional models.
- Focuses on reproducibility using exclusively public data.
- Highlights the importance of relational structures in assessing SME potential.
Computer Science > Machine Learning arXiv:2602.19591 (cs) [Submitted on 23 Feb 2026] Title:Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks Authors:Yijiashun Qi, Hanzhe Guo, Yijiazhen Qi View a PDF of the paper titled Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks, by Yijiashun Qi and 2 other authors View PDF HTML (experimental) Abstract:Small and Medium Enterprises (SMEs) constitute 99.9% of U.S. businesses and generate 44% of economic activity, yet systematically identifying high-potential SMEs remains an open challenge. We introduce SME-HGT, a Heterogeneous Graph Transformer framework that predicts which SBIR Phase I awardees will advance to Phase II funding using exclusively public data. We construct a heterogeneous graph with 32,268 company nodes, 124 research topic nodes, and 13 government agency nodes connected by approximately 99,000 edges across three semantic relation types. SME-HGT achieves an AUPRC of 0.621 0.003 on a temporally-split test set, outperforming an MLP baseline (0.590 0.002) and R-GCN (0.608 0.013) across five random seeds. At a screening depth of 100 companies, SME-HGT attains 89.6% precision with a 2.14 lift over random selection. Our temporal evaluation protocol prevents information leakage, and our reliance on public data ensures reproducibility. These results demonstrate that relational structure among firms, research topics, and funding agencies provides meaningful signal for SME potential assessm...