[2602.22660] LEDA: Latent Semantic Distribution Alignment for Multi-domain Graph Pre-training
Summary
The paper presents LEDA, a novel model for universal graph pre-training that addresses challenges in aligning diverse graph data and enhancing cross-domain semantic learning.
Why It Matters
As machine learning models increasingly rely on graph representations, effective pre-training across multiple domains is crucial for improving performance in various applications. LEDA's innovative approach to semantic alignment could significantly enhance the adaptability and effectiveness of graph-based models in real-world scenarios.
Key Takeaways
- LEDA introduces a dimension projection unit for aligning diverse domain features into a shared semantic space.
- The model employs a variational semantic inference module to guide domain projection and ensure cross-domain semantic learning.
- LEDA outperforms existing in-domain baselines and advanced universal pre-training models in few-shot cross-domain settings.
- The approach addresses limitations of simplistic data alignment and arbitrary in-domain pre-training paradigms.
- LEDA demonstrates strong performance across a variety of graphs and downstream tasks.
Computer Science > Machine Learning arXiv:2602.22660 (cs) [Submitted on 26 Feb 2026] Title:LEDA: Latent Semantic Distribution Alignment for Multi-domain Graph Pre-training Authors:Lianze Shan, Jitao Zhao, Dongxiao He, Siqi Liu, Jiaxu Cui, Weixiong Zhang View a PDF of the paper titled LEDA: Latent Semantic Distribution Alignment for Multi-domain Graph Pre-training, by Lianze Shan and 5 other authors View PDF HTML (experimental) Abstract:Recent advances in generic large models, such as GPT and DeepSeek, have motivated the introduction of universality to graph pre-training, aiming to learn rich and generalizable knowledge across diverse domains using graph representations to improve performance in various downstream applications. However, most existing methods face challenges in learning effective knowledge from generic graphs, primarily due to simplistic data alignment and limited training guidance. The issue of simplistic data alignment arises from the use of a straightforward unification for highly diverse graph data, which fails to align semantics and misleads pre-training models. The problem with limited training guidance lies in the arbitrary application of in-domain pre-training paradigms to cross-domain scenarios. While it is effective in enhancing discriminative representation in one data space, it struggles to capture effective knowledge from many graphs. To address these challenges, we propose a novel Latent sEmantic Distribution Alignment (LEDA) model for universa...