[2510.02789] Align Your Query: Representation Alignment for Multimodality Medical Object Detection
About this article
Abstract page for arXiv paper 2510.02789: Align Your Query: Representation Alignment for Multimodality Medical Object Detection
Computer Science > Computer Vision and Pattern Recognition arXiv:2510.02789 (cs) [Submitted on 3 Oct 2025 (v1), last revised 31 Mar 2026 (this version, v2)] Title:Align Your Query: Representation Alignment for Multimodality Medical Object Detection Authors:Ara Seo, Bryan Sangwoo Kim, Hyungjin Chung, Jong Chul Ye View a PDF of the paper titled Align Your Query: Representation Alignment for Multimodality Medical Object Detection, by Ara Seo and 3 other authors View PDF HTML (experimental) Abstract:Medical object detection suffers when a single detector is trained on mixed medical modalities (e.g., CXR, CT, MRI) due to heterogeneous statistics and disjoint representation spaces. To address this challenge, we turn to representation alignment, an approach that has proven effective for bringing features from different sources into a shared space. Specifically, we target the representations of DETR-style object queries and propose a simple, detector-agnostic framework to align them with modality context. First, we define modality tokens: compact, text-derived embeddings encoding imaging modality that are lightweight and require no extra annotations. We integrate the modality tokens into the detection process via Multimodality Context Attention (MoCA), mixing object-query representations via self-attention to propagate modality context within the query set. This preserves DETR-style architectures and adds negligible latency while injecting modality cues into object queries. We fur...