[2509.16677] Segment-to-Act: Label-Noise-Robust Action-Prompted Video

[2509.16677] Segment-to-Act: Label-Noise-Robust Action-Prompted Video Segmentation Towards Embodied Intelligence

arXiv - Machine Learning March 05, 2026 4 min read

About this article

Abstract page for arXiv paper 2509.16677: Segment-to-Act: Label-Noise-Robust Action-Prompted Video Segmentation Towards Embodied Intelligence

Computer Science > Computer Vision and Pattern Recognition arXiv:2509.16677 (cs) [Submitted on 20 Sep 2025 (v1), last revised 4 Mar 2026 (this version, v2)] Title:Segment-to-Act: Label-Noise-Robust Action-Prompted Video Segmentation Towards Embodied Intelligence Authors:Wenxin Li, Kunyu Peng, Di Wen, Ruiping Liu, Mengfei Duan, Kai Luo, Kailun Yang View a PDF of the paper titled Segment-to-Act: Label-Noise-Robust Action-Prompted Video Segmentation Towards Embodied Intelligence, by Wenxin Li and 6 other authors View PDF HTML (experimental) Abstract:Embodied intelligence relies on accurately segmenting objects actively involved in interactions. Action-based video object segmentation addresses this by linking segmentation with action semantics, but it depends on large-scale annotations and prompts that are costly, inconsistent, and prone to multimodal noise such as imprecise masks and referential ambiguity. To date, this challenge remains unexplored. In this work, we take the first step by studying action-based video object segmentation under label noise, focusing on two sources: textual prompt noise (category flips and within-category noun substitutions) and mask annotation noise (perturbed object boundaries to mimic imprecise supervision). Our contributions are threefold. First, we introduce two types of label noises for the action-based video object segmentation task. Second, we build up the first action-based video object segmentation under a label noise benchmark ActiSeg-...

Originally published on March 05, 2026. Curated by AI News.

Machine Learning

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

arXiv - Machine Learning · 4 min · about 12 hours ago

Machine Learning

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

Abstract page for arXiv paper 2508.00307: Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

arXiv - AI · 4 min · about 12 hours ago

Computer Vision

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Abstract page for arXiv paper 2603.25524: CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations i...

arXiv - AI · 4 min · about 12 hours ago

Machine Learning

[2603.25170] Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling

Abstract page for arXiv paper 2603.25170: Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling

arXiv - AI · 4 min · about 12 hours ago

[2509.16677] Segment-to-Act: Label-Noise-Robust Action-Prompted Video Segmentation Towards Embodied Intelligence

About this article

Related Articles

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

[2603.25170] Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling

No comments

Stay updated with AI News