[2511.00405] UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings
About this article
Abstract page for arXiv paper 2511.00405: UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings
Computer Science > Machine Learning arXiv:2511.00405 (cs) [Submitted on 1 Nov 2025 (v1), last revised 1 Mar 2026 (this version, v2)] Title:UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings Authors:Zhibin Lan, Liqiang Niu, Fandong Meng, Jie Zhou, Jinsong Su View a PDF of the paper titled UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings, by Zhibin Lan and 4 other authors View PDF HTML (experimental) Abstract:The remarkable success of multimodal large language models (MLLMs) has driven advances in multimodal embeddings, yet existing models remain inherently discriminative, limiting their ability to benefit from reasoning-driven generation paradigm. In this work, we pioneer the exploration of generative embeddings, unifying embedding tasks within a generative paradigm. We propose UME-R1, a universal multimodal embedding framework consisting of a two-stage training strategy: a cold-start supervised fine-tuning equips the model with reasoning capabilities and enables it to generate both discriminative and generative embeddings; a subsequent reinforcement learning enhances reasoning and further optimizes generative embedding quality. This pioneering work reveals four key insights: 1) generative embeddings unlock substantial performance gains over conventional discriminative embeddings by leveraging the powerful generative reasoning capabilities of MLLMs; 2) discriminative and generative embeddings are complementary, whose combined oracle perfo...