[2504.06193] Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction

arXiv - AI February 17, 2026 4 min read Article

Summary

This article explores the effectiveness of heuristic methods in distilling Multi-Layer Perceptrons (MLPs) for graph link prediction, revealing that simpler models can outperform more complex ones in certain scenarios.

Why It Matters

Understanding the role of heuristic methods in model distillation can lead to more efficient machine learning practices, particularly in graph-based tasks. The findings challenge conventional wisdom about model complexity and performance, offering practical implications for researchers and practitioners in the field.

Key Takeaways

Heuristic methods can effectively teach MLPs, sometimes outperforming complex GNNs.
The proposed Ensemble Heuristic-Distilled MLPs (EHDM) reduces training time significantly.
Stronger teachers do not always yield stronger students in model distillation.
EHDM shows an average improvement of 7.93% over previous methods.
The study emphasizes the importance of exploring alternative teaching methods in machine learning.

Computer Science > Machine Learning arXiv:2504.06193 (cs) [Submitted on 8 Apr 2025 (v1), last revised 15 Feb 2026 (this version, v3)] Title:Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction Authors:Zongyue Qin, Shichang Zhang, Mingxuan Ju, Tong Zhao, Neil Shah, Yizhou Sun View a PDF of the paper titled Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction, by Zongyue Qin and 5 other authors View PDF HTML (experimental) Abstract:Link prediction is a crucial graph-learning task with applications including citation prediction and product recommendation. Distilling Graph Neural Networks (GNNs) teachers into Multi-Layer Perceptrons (MLPs) students has emerged as an effective approach to achieve strong performance and reducing computational cost by removing graph dependency. However, existing distillation methods only use standard GNNs and overlook alternative teachers such as specialized model for link prediction (GNN4LP) and heuristic methods (e.g., common neighbors). This paper first explores the impact of different teachers in GNN-to-MLP distillation. Surprisingly, we find that stronger teachers do not always produce stronger students: MLPs distilled from GNN4LP can underperform those distilled from simpler GNNs, while weaker heuristic methods can teach MLPs to near-GNN performance with drastically reduced training costs. Building on these insights, we propose Ensemble Heuristic-Distilled MLPs (EHDM), which eliminates graph...

Read Original Article