[2602.14771] GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture
Summary
GOT-JEPA introduces a novel framework for generic object tracking that enhances model adaptation and occlusion handling, improving robustness and generalization in dynamic environments.
Why It Matters
This research addresses significant limitations in current object tracking methods, particularly their inability to handle occlusions and adapt to unseen scenarios. By improving generalization and occlusion perception, GOT-JEPA has implications for various applications in computer vision, including surveillance and autonomous systems.
Key Takeaways
- GOT-JEPA enhances object tracking by integrating model adaptation and occlusion handling.
- The framework uses a teacher-student model to generate and learn pseudo-tracking models.
- OccuSolver improves occlusion perception and visibility estimation for better tracking performance.
- Extensive evaluations demonstrate improved generalization across multiple benchmarks.
- The approach is relevant for applications requiring robust tracking in dynamic environments.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.14771 (cs) [Submitted on 16 Feb 2026] Title:GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture Authors:Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu Lin View a PDF of the paper titled GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture, by Shih-Fang Chen and 3 other authors View PDF HTML (experimental) Abstract:The human visual system tracks objects by integrating current observations with previously observed information, adapting to target and scene changes, and reasoning about occlusion at fine granularity. In contrast, recent generic object trackers are often optimized for training targets, which limits robustness and generalization in unseen scenarios, and their occlusion reasoning remains coarse, lacking detailed modeling of occlusion patterns. To address these limitations in generalization and occlusion perception, we propose GOT-JEPA, a model-predictive pretraining framework that extends JEPA from predicting image features to predicting tracking models. Given identical historical information, a teacher predictor generates pseudo-tracking models from a clean current frame, and a student predictor learns to predict the same pseudo-tracking models from a corrupted version of the current frame. This design provides stable pseudo supervision and explic...