[2511.17649] SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios
About this article
Abstract page for arXiv paper 2511.17649: SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios
Computer Science > Computer Vision and Pattern Recognition arXiv:2511.17649 (cs) [Submitted on 20 Nov 2025 (v1), last revised 27 Feb 2026 (this version, v2)] Title:SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios Authors:Jieru Lin, Zhiwei Yu, Börje F. Karlsson View a PDF of the paper titled SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios, by Jieru Lin and 2 other authors View PDF HTML (experimental) Abstract:Autonomous agents operating in the real world must interact continuously with existing physical and semantic infrastructure, track delayed consequences, and verify outcomes over time. Everyday environments are rich in tangible control interfaces (TCIs)-e.g., light switches, appliance panels, and embedded GUI-posing core challenges for lifelong embodied agents, including partial observability, causal reasoning across time, and failure-aware verification under real-world constraints. Yet, current benchmarks rarely consider such long-horizon interaction and causality requirements. We introduce SWITCH (Semantic World Interface Tasks for Control & Handling), an embodied, task-driven benchmark created through iterative releases to probe these gaps. Its first iteration, SWITCH-Basic, evaluates five complementary abilities-task-aware VQA, semantic UI grounding, action generation, state transition prediction, and result verification-under ego-centric RGB video input and devi...