[2602.03098] TextME: Bridging Unseen Modalities Through Text Descriptions

[2602.03098] TextME: Bridging Unseen Modalities Through Text Descriptions

arXiv - AI 3 min read Article

Summary

The paper introduces TextME, a framework that enables zero-shot cross-modal transfer using only text descriptions, addressing the limitations of paired datasets in multimodal representation learning.

Why It Matters

TextME presents a significant advancement in machine learning by allowing for the expansion of multimodal representations without the need for costly paired datasets. This is particularly relevant in fields like medical imaging and molecular analysis, where expert annotations are scarce. The framework's ability to facilitate cross-modal retrieval enhances its applicability across various domains, potentially transforming how we approach multimodal learning.

Key Takeaways

  • TextME enables modality expansion using only text descriptions.
  • The framework allows zero-shot cross-modal transfer, enhancing flexibility.
  • Empirical validation shows substantial performance retention without paired supervision.
  • Text-only training can facilitate emergent retrieval between unaligned modalities.
  • This approach is a practical alternative to traditional paired dataset methods.

Computer Science > Machine Learning arXiv:2602.03098 (cs) [Submitted on 3 Feb 2026 (v1), last revised 23 Feb 2026 (this version, v2)] Title:TextME: Bridging Unseen Modalities Through Text Descriptions Authors:Soyeon Hong, Jinchan Kim, Jaegook You, Seungtaek Choi, Suha Kwak, Hyunsouk Cho View a PDF of the paper titled TextME: Bridging Unseen Modalities Through Text Descriptions, by Soyeon Hong and 5 other authors View PDF HTML (experimental) Abstract:Expanding multimodal representations to novel modalities is constrained by reliance on large-scale paired datasets (e.g., text-image, text-audio, text-3D, text-molecule), which are costly and often infeasible in domains requiring expert annotation such as medical imaging and molecular analysis. We introduce TextME, the first text-only modality expansion framework, to the best of our knowledge, projecting diverse modalities into LLM embedding space as a unified anchor. Our approach exploits the geometric structure of pretrained contrastive encoders to enable zero-shot cross-modal transfer using only text descriptions, without paired supervision. We empirically validate that such consistent modality gaps exist across image, video, audio, 3D, X-ray, and molecular domains, demonstrating that text-only training can preserve substantial performance of pretrained encoders. We further show that our framework enables emergent cross-modal retrieval between modality pairs not explicitly aligned during training (e.g., audio-to-image, 3D-to...

Related Articles

Llms

Building knowledge bases from YouTube data using LLMs -- my workflow after 52 guides

I've been building a system that turns YouTube channels into structured knowledge bases. Thought I'd share the workflow since Karpathy's ...

Reddit - Artificial Intelligence · 1 min ·
What is AI, how do apps like ChatGPT work and why are there concerns?
Llms

What is AI, how do apps like ChatGPT work and why are there concerns?

AI is transforming modern life, but some critics worry about its potential misuse and environmental impact.

AI News - General · 7 min ·
[2603.29957] Think Anywhere in Code Generation
Llms

[2603.29957] Think Anywhere in Code Generation

Abstract page for arXiv paper 2603.29957: Think Anywhere in Code Generation

arXiv - Machine Learning · 3 min ·
[2603.16880] NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectro-Spatial Grounding and Temporal State-Space Reasoning
Llms

[2603.16880] NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectro-Spatial Grounding and Temporal State-Space Reasoning

Abstract page for arXiv paper 2603.16880: NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectr...

arXiv - Machine Learning · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime