[2405.05523] Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training
Summary
This paper introduces a novel Positional Recovery Training (Port) framework for improving temporal grounding in animal behavior analysis, addressing challenges in data sparsity and distribution.
Why It Matters
Understanding animal behavior through temporal grounding is vital for advancements in multimodal learning. The proposed framework enhances model accuracy and efficiency, which could lead to significant improvements in fields like robotics and AI-driven wildlife studies.
Key Takeaways
- The Port framework enhances temporal grounding by prompting models with specific behavior start and end times.
- It addresses challenges posed by data sparsity and uniform distribution in animal behavior datasets.
- The framework's effectiveness is demonstrated through experiments on the Animal Kingdom dataset, achieving competitive performance.
- Port includes a Recovering branch to reconstruct corrupted label sequences, improving model alignment.
- This research contributes to the field of multimodal learning, particularly in understanding animal behavior.
Computer Science > Computer Vision and Pattern Recognition arXiv:2405.05523 (cs) [Submitted on 9 May 2024 (v1), last revised 18 Feb 2026 (this version, v2)] Title:Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training Authors:Sheng Yan, Xin Du, Zongying Li, Yi Wang, Hongcang Jin, Mengyuan Liu View a PDF of the paper titled Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training, by Sheng Yan and 5 other authors View PDF HTML (experimental) Abstract:Temporal grounding is crucial in multimodal learning, but it poses challenges when applied to animal behavior data due to the sparsity and uniform distribution of moments. To address these challenges, we propose a novel Positional Recovery Training framework (Port), which prompts the model with the start and end times of specific animal behaviors during training. Specifically, \port{} enhances the baseline model with a Recovering branch to reconstruct corrupted label sequences and align distributions via a Dual-alignment method. This allows the model to focus on specific temporal regions prompted by ground-truth information. Extensive experiments on the Animal Kingdom dataset demonstrate the effectiveness of \port{}, achieving an IoU@0.3 of 38.52. It emerges as one of the top performers in the sub-track of MMVRAC in ICME 2024 Grand Challenges. Comments: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI) Cite a...