[2508.00955] From Generator to Embedder: Harnessing Innate Abilities of Multimodal LLMs via Building Zero-Shot Discriminative Embedding Model

[2508.00955] From Generator to Embedder: Harnessing Innate Abilities of Multimodal LLMs via Building Zero-Shot Discriminative Embedding Model

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2508.00955: From Generator to Embedder: Harnessing Innate Abilities of Multimodal LLMs via Building Zero-Shot Discriminative Embedding Model

Computer Science > Machine Learning arXiv:2508.00955 (cs) [Submitted on 1 Aug 2025 (v1), last revised 27 Feb 2026 (this version, v2)] Title:From Generator to Embedder: Harnessing Innate Abilities of Multimodal LLMs via Building Zero-Shot Discriminative Embedding Model Authors:Yeong-Joon Ju, Seong-Whan Lee View a PDF of the paper titled From Generator to Embedder: Harnessing Innate Abilities of Multimodal LLMs via Building Zero-Shot Discriminative Embedding Model, by Yeong-Joon Ju and 1 other authors View PDF HTML (experimental) Abstract:Adapting generative Multimodal Large Language Models (MLLMs) into universal embedding models typically demands resource-intensive contrastive pre-training, while traditional hard negative mining methods suffer from severe false negative contamination. In this paper, we propose a highly data-efficient framework that bypasses extensive pre-training to build a robust multimodal representation space. We first introduce a hierarchical embedding prompt that provides strong latent conditioning. By explicitly anchoring task definitions at the system level, this prompting strategy effectively bridges the modality gap and unlocks powerful zero-shot embedding capabilities. Building upon this latent conditioning, we present Self-aware Hard Negative Sampling (SaHa). Unlike conventional candidate-space mining, SaHa shifts the mechanism to the query-space by mapping retrieved candidates back to their owner queries to rigorously filter out semantic false n...

Originally published on March 02, 2026. Curated by AI News.

Related Articles

Llms

Combining the robot operating system with LLMs for natural-language control

Over the past few decades, robotics researchers have developed a wide range of increasingly advanced robots that can autonomously complet...

Reddit - Artificial Intelligence · 1 min ·
Llms

Which LLM is the best for writing a scientific paper?

I'll need to write a scientifc research paper for university. We're allowed and encouraged to use AI for our work. Be it for language or ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Anthropic is training Claude to recognize when its own tools are trying to manipulate it

One thing from Claude Code's source that I think is underappreciated. There's an explicit instruction in the system prompt: if the AI sus...

Reddit - Artificial Intelligence · 1 min ·
Llms

The Claude Code leak accidentally published the first complete blueprint for production AI agents. Here's what it tells us about where this is all going.

Most coverage of the Claude Code leak focuses on the drama or the hidden features. But the bigger story is that this is the first time we...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime