[2604.04908] HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection
About this article
Abstract page for arXiv paper 2604.04908: HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection
Computer Science > Machine Learning arXiv:2604.04908 (cs) [Submitted on 6 Apr 2026] Title:HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection Authors:Vadim Vashkelis, Natalia Trukhina View a PDF of the paper titled HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection, by Vadim Vashkelis and 1 other authors View PDF HTML (experimental) Abstract:Mixture-of-Experts (MoE) architectures enable conditional computation by activating only a subset of model parameters for each input. Although sparse routing has been highly effective in language models and has also shown promise in vision, most vision MoE methods operate at the image or patch level. This granularity is poorly aligned with object detection, where the fundamental unit of reasoning is an object query corresponding to a candidate instance. We propose Hierarchical Instance-Conditioned Mixture-of-Experts (HI-MoE), a DETR-style detection architecture that performs routing in two stages: a lightweight scene router first selects a scene-consistent expert subset, and an instance router then assigns each object query to a small number of experts within that subset. This design aims to preserve sparse computation while better matching the heterogeneous, instance-centric structure of detection. In the current draft, experiments are concentrated on COCO with preliminary specialization analysis on LVIS. Under these settings, HI-MoE improves over a dense DINO baseline and o...