[2410.06355] UNCOM: Zero-shot Context-Aware Command Understanding for Tabletop Scenarios
About this article
Abstract page for arXiv paper 2410.06355: UNCOM: Zero-shot Context-Aware Command Understanding for Tabletop Scenarios
Computer Science > Robotics arXiv:2410.06355 (cs) [Submitted on 8 Oct 2024 (v1), last revised 8 May 2026 (this version, v3)] Title:UNCOM: Zero-shot Context-Aware Command Understanding for Tabletop Scenarios Authors:Antonio Galiza Cerdeira Gonzalez, Paweł Gajewski, Bipin Indurkhya View a PDF of the paper titled UNCOM: Zero-shot Context-Aware Command Understanding for Tabletop Scenarios, by Antonio Galiza Cerdeira Gonzalez and 2 other authors View PDF HTML (experimental) Abstract:This paper presents UNCOM, a novel hybrid framework for interpreting natural human commands in tabletop scenarios. The system integrates multiple sources of information -- speech, gestures, and scene context -- to extract structured, actionable instructions for robots. Addressing the need for general-purpose human-robot interaction in domestic environments, UNCOM is designed for zero-shot operation, without reliance on predefined object models or training data specific to a given task. Using foundational and task-specific deep learning models, it allows out-of-the-box speech recognition, natural language understanding, gesture detection, and object segmentation. The modular architecture enhances transparency and explainability by explicitly parsing commands into object-action-target representations, enabling integration with symbolic robotic frameworks. We demonstrate the system in a TIAGo++ robot and provide an evaluation on a real-world data set of human-robot interaction scenarios; achieving an 8...