[2602.21864] DynamicGTR: Leveraging Graph Topology Representation Preferences to Boost VLM Capabilities on Graph QAs
Summary
The paper presents DynamicGTR, a framework that enhances Vision-Language Models (VLMs) by dynamically selecting optimal graph topology representations for improved zero-shot question answering on graph-related queries.
Why It Matters
DynamicGTR addresses the limitations of existing graph topology representation methods in VLMs, which often lead to inaccurate responses. By allowing for dynamic selection based on specific queries, it enhances the model's performance across various applications, making it a significant advancement in the field of AI and graph-based question answering.
Key Takeaways
- DynamicGTR improves the accuracy and efficiency of VLMs in handling graph-related queries.
- The framework allows for a customizable trade-off between accuracy and brevity in responses.
- DynamicGTR demonstrates strong transferability across different tasks and domains without additional training.
- The approach addresses the limitations of fixed graph topology representations in existing models.
- Extensive experiments validate the effectiveness of DynamicGTR in real-world applications.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.21864 (cs) [Submitted on 25 Feb 2026] Title:DynamicGTR: Leveraging Graph Topology Representation Preferences to Boost VLM Capabilities on Graph QAs Authors:Yanbin Wei, Jiangyue Yan, Chun Kang, Yang Chen, Hua Liu, James Kwok, Yu Zhang View a PDF of the paper titled DynamicGTR: Leveraging Graph Topology Representation Preferences to Boost VLM Capabilities on Graph QAs, by Yanbin Wei and 6 other authors View PDF HTML (experimental) Abstract:Vision-Language Models (VLMs) have emerged as versatile solutions for zero-shot question answering (QA) across various domains. However, enabling VLMs to effectively comprehend structured graphs and perform accurate, efficient QA remains challenging. Existing approaches typically rely on one single graph topology representation (GTR), such as fixed-style visual images or unified text descriptions. This ``one-size-fits-all'' strategy often neglects model-specific and task-specific preferences, resulting in inaccurate or over-lengthy responses to graph-related queries. To address this, we propose the $\mbox{DynamicGTR}$ framework, which dynamically selects the optimal GTR for each query during inference, thereby enhancing the zero-shot graph QA capabilities of VLMs with a customizable accuracy and brevity trade-off. Extensive experiments show that DynamicGTR not only improves VLM-based graph algorithm QA performance but also successfully transfers the experience trained fr...