[2603.02435] VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings
About this article
Abstract page for arXiv paper 2603.02435: VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings
Computer Science > Artificial Intelligence arXiv:2603.02435 (cs) [Submitted on 2 Mar 2026] Title:VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings Authors:Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring View a PDF of the paper titled VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings, by Athanasios Efthymiou and 4 other authors View PDF HTML (experimental) Abstract:Real-world multimodal knowledge graphs (MKGs) are inherently heterogeneous, modeling entities that are associated with diverse modalities. Traditional knowledge graph embedding (KGE) methods excel at learning continuous representations of entities and relations, yet they are typically designed for unimodal settings. Recent approaches extend KGE to multimodal settings but remain constrained, often processing modalities in isolation, resulting in weak cross-modal alignment, and relying on simplistic assumptions such as uniform modality availability across entities. Vision-Language Models (VLMs) offer a powerful way to align diverse modalities within a shared embedding space. We propose Vision-Language Knowledge Graph Embeddings (VL-KGE), a framework that integrates cross-modal alignment from VLMs with structured relational modeling to learn unified multimodal representations of knowledge graphs. Experiments on WN9-IMG and two novel fine art MKGs, WikiArt-MKG-v1 and WikiArt-MKG-v2, demonstrate that VL-KGE consistently improves over traditional unimoda...