[2603.01385] Toward Graph-Tokenizing Large Language Models with Reconstructive Graph Instruction Tuning
About this article
Abstract page for arXiv paper 2603.01385: Toward Graph-Tokenizing Large Language Models with Reconstructive Graph Instruction Tuning
Computer Science > Computation and Language arXiv:2603.01385 (cs) [Submitted on 2 Mar 2026] Title:Toward Graph-Tokenizing Large Language Models with Reconstructive Graph Instruction Tuning Authors:Zhongjian Zhang, Xiao Wang, Mengmei Zhang, Jiarui Tan, Chuan Shi View a PDF of the paper titled Toward Graph-Tokenizing Large Language Models with Reconstructive Graph Instruction Tuning, by Zhongjian Zhang and 4 other authors View PDF HTML (experimental) Abstract:The remarkable success of large language models (LLMs) has motivated researchers to adapt them as universal predictors for various graph-related tasks, with the ultimate goal of developing a graph foundation model that generalizes diverse scenarios. The key challenge is to align graph data with language spaces so that LLMs can better comprehend graphs. As a popular paradigm, Graph-Tokenizing LLMs (GTokenLLMs) encode complex structures and lengthy texts into a graph token sequence, and then align them with text tokens via language instructions tuning. Despite their initial success, our information-theoretic analysis reveals that existing GTokenLLMs rely solely on text supervision from language instructions, which achieve only implicit graph-text alignment, resulting in a text-dominant bias that underutilizes graph context. To overcome this limitation, we first prove that the alignment objective is upper-bounded by the mutual information between the input graphs and their hidden representations in the LLM, which motivates...