[2602.17394] Voice-Driven Semantic Perception for UAV-Assisted Emergency Networks
Summary
The paper presents SIREN, an AI framework for enhancing UAV-assisted emergency networks by converting voice communications into structured data for improved situational awareness.
Why It Matters
As emergency response scenarios often lack reliable communication infrastructure, integrating voice-driven AI technologies can significantly enhance the effectiveness of UAV-assisted operations. This research addresses critical challenges in emergency management, offering a solution that improves decision-making and operational efficiency.
Key Takeaways
- SIREN framework integrates ASR and LLM for voice data processing.
- Enables structured information extraction from emergency voice communications.
- Demonstrated robust performance in diverse operational conditions.
- Identifies speaker diarization and geographic ambiguity as key challenges.
- Supports human-in-the-loop decision-making for emergency response.
Computer Science > Networking and Internet Architecture arXiv:2602.17394 (cs) [Submitted on 19 Feb 2026] Title:Voice-Driven Semantic Perception for UAV-Assisted Emergency Networks Authors:Nuno Saavedra, Pedro Ribeiro, André Coelho, Rui Campos View a PDF of the paper titled Voice-Driven Semantic Perception for UAV-Assisted Emergency Networks, by Nuno Saavedra and 3 other authors View PDF HTML (experimental) Abstract:Unmanned Aerial Vehicle (UAV)-assisted networks are increasingly foreseen as a promising approach for emergency response, providing rapid, flexible, and resilient communications in environments where terrestrial infrastructure is degraded or unavailable. In such scenarios, voice radio communications remain essential for first responders due to their robustness; however, their unstructured nature prevents direct integration with automated UAV-assisted network management. This paper proposes SIREN, an AI-driven framework that enables voice-driven perception for UAV-assisted networks. By integrating Automatic Speech Recognition (ASR) with Large Language Model (LLM)-based semantic extraction and Natural Language Processing (NLP) validation, SIREN converts emergency voice traffic into structured, machine-readable information, including responding units, location references, emergency severity, and Quality-of-Service (QoS) requirements. SIREN is evaluated using synthetic emergency scenarios with controlled variations in language, speaker count, background noise, and m...