[2602.15909] Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
Summary
The paper presents Resp-Agent, an innovative agent-based system for generating multimodal respiratory sounds and diagnosing diseases, addressing challenges in deep learning-based respiratory auscultation.
Why It Matters
Resp-Agent tackles critical issues in respiratory sound analysis, such as information loss and data scarcity, which are significant barriers in medical diagnostics. By improving diagnostic accuracy and robustness, this research has the potential to enhance patient care and outcomes in respiratory health.
Key Takeaways
- Resp-Agent utilizes an Active Adversarial Curriculum Agent to enhance diagnostic capabilities.
- The system integrates electronic health record (EHR) data with audio tokens for improved context understanding.
- A new Flow Matching Generator adapts LLMs to synthesize challenging diagnostic samples.
- Resp-229k, a benchmark corpus, supports the system with extensive audio recordings and clinical narratives.
- The approach demonstrates superior performance in diverse evaluation settings, particularly under data scarcity.
Electrical Engineering and Systems Science > Audio and Speech Processing arXiv:2602.15909 (eess) [Submitted on 16 Feb 2026] Title:Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis Authors:Pengfei Zhang, Tianxin Xie, Minghao Yang, Li Liu View a PDF of the paper titled Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis, by Pengfei Zhang and 2 other authors View PDF HTML (experimental) Abstract:Deep learning-based respiratory auscultation is currently hindered by two fundamental challenges: (i) inherent information loss, as converting signals into spectrograms discards transient acoustic events and clinical context; (ii) limited data availability, exacerbated by severe class imbalance. To bridge these gaps, we present Resp-Agent, an autonomous multimodal system orchestrated by a novel Active Adversarial Curriculum Agent (Thinker-A$^2$CA). Unlike static pipelines, Thinker-A$^2$CA serves as a central controller that actively identifies diagnostic weaknesses and schedules targeted synthesis in a closed loop. To address the representation gap, we introduce a Modality-Weaving Diagnoser that weaves EHR data with audio tokens via Strategic Global Attention and sparse audio anchors, capturing both long-range clinical context and millisecond-level transients. To address the data gap, we design a Flow Matching Generator that adapts a text-only Large Language Model (LLM) via modality injec...