[2603.22179] MARCUS: An agentic, multimodal vision-language model for cardiac diagnosis and management
About this article
Abstract page for arXiv paper 2603.22179: MARCUS: An agentic, multimodal vision-language model for cardiac diagnosis and management
Computer Science > Artificial Intelligence arXiv:2603.22179 (cs) [Submitted on 23 Mar 2026] Title:MARCUS: An agentic, multimodal vision-language model for cardiac diagnosis and management Authors:Jack W O'Sullivan, Mohammad Asadi, Lennart Elbe, Akshay Chaudhari, Tahoura Nedaee, Francois Haddad, Michael Salerno, Li Fe-Fei, Ehsan Adeli, Rima Arnaout, Euan A Ashley View a PDF of the paper titled MARCUS: An agentic, multimodal vision-language model for cardiac diagnosis and management, by Jack W O'Sullivan and 10 other authors View PDF HTML (experimental) Abstract:Cardiovascular disease remains the leading cause of global mortality, with progress hindered by human interpretation of complex cardiac tests. Current AI vision-language models are limited to single-modality inputs and are non-interactive. We present MARCUS (Multimodal Autonomous Reasoning and Chat for Ultrasound and Signals), an agentic vision-language system for end-to-end interpretation of electrocardiograms (ECGs), echocardiograms, and cardiac magnetic resonance imaging (CMR) independently and as multimodal input. MARCUS employs a hierarchical agentic architecture comprising modality-specific vision-language expert models, each integrating domain-trained visual encoders with multi-stage language model optimization, coordinated by a multimodal orchestrator. Trained on 13.5 million images (0.25M ECGs, 1.3M echocardiogram images, 12M CMR images) and our novel expert-curated dataset spanning 1.6 million questions, MA...