[2510.02789] Align Your Query: Representation Alignment for Multimodality Medical Object Detection

[2510.02789] Align Your Query: Representation Alignment for Multimodality Medical Object Detection

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2510.02789: Align Your Query: Representation Alignment for Multimodality Medical Object Detection

Computer Science > Computer Vision and Pattern Recognition arXiv:2510.02789 (cs) [Submitted on 3 Oct 2025 (v1), last revised 31 Mar 2026 (this version, v2)] Title:Align Your Query: Representation Alignment for Multimodality Medical Object Detection Authors:Ara Seo, Bryan Sangwoo Kim, Hyungjin Chung, Jong Chul Ye View a PDF of the paper titled Align Your Query: Representation Alignment for Multimodality Medical Object Detection, by Ara Seo and 3 other authors View PDF HTML (experimental) Abstract:Medical object detection suffers when a single detector is trained on mixed medical modalities (e.g., CXR, CT, MRI) due to heterogeneous statistics and disjoint representation spaces. To address this challenge, we turn to representation alignment, an approach that has proven effective for bringing features from different sources into a shared space. Specifically, we target the representations of DETR-style object queries and propose a simple, detector-agnostic framework to align them with modality context. First, we define modality tokens: compact, text-derived embeddings encoding imaging modality that are lightweight and require no extra annotations. We integrate the modality tokens into the detection process via Multimodality Context Attention (MoCA), mixing object-query representations via self-attention to propagate modality context within the query set. This preserves DETR-style architectures and adds negligible latency while injecting modality cues into object queries. We fur...

Originally published on April 01, 2026. Curated by AI News.

Related Articles

[2507.11539] Streaming 4D Visual Geometry Transformer
Llms

[2507.11539] Streaming 4D Visual Geometry Transformer

Abstract page for arXiv paper 2507.11539: Streaming 4D Visual Geometry Transformer

arXiv - AI · 4 min ·
[2603.29927] End-to-End Image Compression with Segmentation Guided Dual Coding for Wind Turbines
Machine Learning

[2603.29927] End-to-End Image Compression with Segmentation Guided Dual Coding for Wind Turbines

Abstract page for arXiv paper 2603.29927: End-to-End Image Compression with Segmentation Guided Dual Coding for Wind Turbines

arXiv - AI · 4 min ·
[2603.29694] Exploring the Impact of Skin Color on Skin Lesion Segmentation
Computer Vision

[2603.29694] Exploring the Impact of Skin Color on Skin Lesion Segmentation

Abstract page for arXiv paper 2603.29694: Exploring the Impact of Skin Color on Skin Lesion Segmentation

arXiv - AI · 4 min ·
[2603.29535] Quantization with Unified Adaptive Distillation to enable multi-LoRA based one-for-all Generative Vision Models on edge
Machine Learning

[2603.29535] Quantization with Unified Adaptive Distillation to enable multi-LoRA based one-for-all Generative Vision Models on edge

Abstract page for arXiv paper 2603.29535: Quantization with Unified Adaptive Distillation to enable multi-LoRA based one-for-all Generati...

arXiv - AI · 4 min ·
More in Computer Vision: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime