[2410.16882] SaVe-TAG: LLM-based Interpolation for Long-Tailed Text-Attributed Graphs
Summary
The paper presents SaVe-TAG, a novel framework that utilizes Large Language Models for semantic-aware interpolation in long-tailed text-attributed graphs, enhancing classification performance in imbalanced datasets.
Why It Matters
This research addresses the challenges of class imbalance in graph neural networks, particularly in text-attributed graphs. By integrating semantic understanding through LLMs, it offers a more effective method for generating synthetic samples, which is crucial for improving model generalization and performance in real-world applications.
Key Takeaways
- SaVe-TAG leverages LLMs for semantic-aware interpolation in graph data.
- The method addresses class imbalance in long-tailed distributions effectively.
- A confidence-based edge assignment mechanism ensures structural consistency.
- Extensive experiments demonstrate superior performance over existing methods.
- The approach highlights the importance of combining semantic and structural signals.
Computer Science > Artificial Intelligence arXiv:2410.16882 (cs) [Submitted on 22 Oct 2024 (v1), last revised 13 Feb 2026 (this version, v5)] Title:SaVe-TAG: LLM-based Interpolation for Long-Tailed Text-Attributed Graphs Authors:Leyao Wang, Yu Wang, Bo Ni, Yuying Zhao, Hanyu Wang, Yao Ma, Tyler Derr View a PDF of the paper titled SaVe-TAG: LLM-based Interpolation for Long-Tailed Text-Attributed Graphs, by Leyao Wang and 6 other authors View PDF HTML (experimental) Abstract:Real-world graph data often follows long-tailed distributions, making it difficult for Graph Neural Networks (GNNs) to generalize well across both head and tail classes. Recent advances in Vicinal Risk Minimization (VRM) have shown promise in mitigating class imbalance with numeric interpolation; however, existing approaches largely rely on embedding-space arithmetic, which fails to capture the rich semantics inherent in text-attributed graphs. In this work, we propose our method, SaVe-TAG (Semantic-aware Vicinal Risk Minimization for Long-Tailed Text-Attributed Graphs), a novel VRM framework that leverages Large Language Models (LLMs) to perform text-level interpolation, generating on-manifold, boundary-enriching synthetic samples for minority classes. To mitigate the risk of noisy generation, we introduce a confidence-based edge assignment mechanism that uses graph topology as a natural filter to ensure structural consistency. We provide theoretical justification for our method and conduct extensive ex...