[2510.02789] Align Your Query: Representation Alignment for

[2510.02789] Align Your Query: Representation Alignment for Multimodality Medical Object Detection

arXiv - AI April 01, 2026 4 min read

About this article

Abstract page for arXiv paper 2510.02789: Align Your Query: Representation Alignment for Multimodality Medical Object Detection

Computer Science > Computer Vision and Pattern Recognition arXiv:2510.02789 (cs) [Submitted on 3 Oct 2025 (v1), last revised 31 Mar 2026 (this version, v2)] Title:Align Your Query: Representation Alignment for Multimodality Medical Object Detection Authors:Ara Seo, Bryan Sangwoo Kim, Hyungjin Chung, Jong Chul Ye View a PDF of the paper titled Align Your Query: Representation Alignment for Multimodality Medical Object Detection, by Ara Seo and 3 other authors View PDF HTML (experimental) Abstract:Medical object detection suffers when a single detector is trained on mixed medical modalities (e.g., CXR, CT, MRI) due to heterogeneous statistics and disjoint representation spaces. To address this challenge, we turn to representation alignment, an approach that has proven effective for bringing features from different sources into a shared space. Specifically, we target the representations of DETR-style object queries and propose a simple, detector-agnostic framework to align them with modality context. First, we define modality tokens: compact, text-derived embeddings encoding imaging modality that are lightweight and require no extra annotations. We integrate the modality tokens into the detection process via Multimodality Context Attention (MoCA), mixing object-query representations via self-attention to propagate modality context within the query set. This preserves DETR-style architectures and adds negligible latency while injecting modality cues into object queries. We fur...

Originally published on April 01, 2026. Curated by AI News.

Llms

[2507.11539] Streaming 4D Visual Geometry Transformer

Abstract page for arXiv paper 2507.11539: Streaming 4D Visual Geometry Transformer

arXiv - AI · 4 min · about 3 hours ago

Machine Learning

[2603.29927] End-to-End Image Compression with Segmentation Guided Dual Coding for Wind Turbines

Abstract page for arXiv paper 2603.29927: End-to-End Image Compression with Segmentation Guided Dual Coding for Wind Turbines

arXiv - AI · 4 min · about 3 hours ago

Computer Vision

[2603.29694] Exploring the Impact of Skin Color on Skin Lesion Segmentation

Abstract page for arXiv paper 2603.29694: Exploring the Impact of Skin Color on Skin Lesion Segmentation

arXiv - AI · 4 min · about 3 hours ago

Machine Learning

[2603.29535] Quantization with Unified Adaptive Distillation to enable multi-LoRA based one-for-all Generative Vision Models on edge

Abstract page for arXiv paper 2603.29535: Quantization with Unified Adaptive Distillation to enable multi-LoRA based one-for-all Generati...

arXiv - AI · 4 min · about 3 hours ago

[2510.02789] Align Your Query: Representation Alignment for Multimodality Medical Object Detection

About this article

Related Articles

[2507.11539] Streaming 4D Visual Geometry Transformer

[2603.29927] End-to-End Image Compression with Segmentation Guided Dual Coding for Wind Turbines

[2603.29694] Exploring the Impact of Skin Color on Skin Lesion Segmentation

[2603.29535] Quantization with Unified Adaptive Distillation to enable multi-LoRA based one-for-all Generative Vision Models on edge

No comments

Stay updated with AI News