[2602.18452] RA-QA: Towards Respiratory Audio-based Health Question Answering
Summary
The paper presents RA-QA, a novel dataset and benchmark for respiratory audio-based health question answering, addressing a critical gap in real-time clinical interactions.
Why It Matters
Respiratory diseases are a leading global health concern, and this research introduces a structured dataset that enhances the ability of intelligent systems to interact with patients. By bridging audio analysis and natural language processing, it paves the way for improved diagnostic tools in respiratory healthcare, potentially transforming patient interactions and outcomes.
Key Takeaways
- RA-QA is the first multimodal dataset focused on respiratory audio and natural language.
- The dataset includes 7.5 million QA pairs across various question types, enhancing research capabilities.
- The study establishes a benchmark for comparing audio-text generation models with traditional classifiers.
- Performance variations across attributes and question types provide insights for future model improvements.
- The work highlights the potential for more interactive and intelligent diagnostic tools in healthcare.
Computer Science > Sound arXiv:2602.18452 (cs) [Submitted on 4 Feb 2026] Title:RA-QA: Towards Respiratory Audio-based Health Question Answering Authors:Gaia A. Bertolino, Yuwei Zhang, Tong Xia, Domenico Talia, Cecilia Mascolo View a PDF of the paper titled RA-QA: Towards Respiratory Audio-based Health Question Answering, by Gaia A. Bertolino and 4 other authors View PDF HTML (experimental) Abstract:Respiratory diseases are a leading cause of death globally, highlighting the urgent need for early and accessible screening methods. While some lung auscultation analysis has been automated and machine learning audio based models are able to predict respiratory pathologies, there remains a critical gap: the lack of intelligent systems that can interact in real-time consultations using natural language. Unlike other clinical domains, such as electronic health records, radiological images, and biosignals, where numerous question-answering (QA) datasets and models have been established, audio-based modalities remain notably underdeveloped. We curated and harmonized data from 11 diverse respiratory audio datasets to construct the first Respiratory Audio Question Answering (RA-QA) dataset. As the first multimodal QA resource of its kind focused specifically on respiratory health, RA-QA bridges clinical audio and natural language in a structured, scalable format. This new data resource contains about 7.5 million QA pairs spanning more than 60 attributes and three question types: singl...