[2602.19622] VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention
Summary
VecFormer introduces a novel Graph Transformer model that enhances efficiency and generalization in node classification, addressing computational complexity and out-of-distribution performance issues.
Why It Matters
As graph representation learning becomes increasingly important in various applications, VecFormer offers a solution to the scalability and generalization challenges faced by existing models. This advancement could significantly impact fields relying on large graph data, such as social network analysis and bioinformatics.
Key Takeaways
- VecFormer utilizes a two-stage training process to improve efficiency.
- The model reduces computational complexity by employing Graph Token attention mechanisms.
- Extensive experiments show VecFormer outperforms existing Graph Transformers in both speed and performance.
- The approach enhances generalization capabilities in out-of-distribution scenarios.
- This research contributes to the ongoing evolution of graph representation learning.
Computer Science > Machine Learning arXiv:2602.19622 (cs) [Submitted on 23 Feb 2026] Title:VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention Authors:Jingbo Zhou, Jun Xia, Siyuan Li, Yunfan Liu, Wenjun Wang, Yufei Huang, Changxi Chi, Mutian Hong, Zhuoli Ouyang, Shu Wang, Zhongqi Wang, Xingyu Wu, Chang Yu, Stan Z. Li View a PDF of the paper titled VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention, by Jingbo Zhou and 13 other authors View PDF HTML (experimental) Abstract:Graph Transformer has demonstrated impressive capabilities in the field of graph representation learning. However, existing approaches face two critical challenges: (1) most models suffer from exponentially increasing computational complexity, making it difficult to scale to large graphs; (2) attention mechanisms based on node-level operations limit the flexibility of the model and result in poor generalization performance in out-of-distribution (OOD) scenarios. To address these issues, we propose \textbf{VecFormer} (the \textbf{Vec}tor Quantized Graph Trans\textbf{former}), an efficient and highly generalizable model for node classification, particularly under OOD settings. VecFormer adopts a two-stage training paradigm. In the first stage, two codebooks are used to reconstruct the node features and the graph structure, aiming to learn the rich semantic \texttt{Graph Codes}. In the second stage, attention mechanisms are perfo...