[2602.08550] GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing
Summary
GOT-Edit introduces a novel approach to generic object tracking by integrating geometry-aware cues through online model editing, enhancing performance in challenging scenarios.
Why It Matters
This research addresses significant limitations in current object tracking methods, which often overlook 3D geometric information. By enhancing tracking robustness and accuracy, GOT-Edit could improve applications in robotics, surveillance, and autonomous systems, where reliable object tracking is critical.
Key Takeaways
- GOT-Edit integrates 3D geometric cues into object tracking for improved performance.
- The method utilizes online model editing to adaptively enhance tracking capabilities.
- Experiments show GOT-Edit outperforms existing methods, especially under occlusion and clutter.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.08550 (cs) [Submitted on 9 Feb 2026 (v1), last revised 23 Feb 2026 (this version, v2)] Title:GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing Authors:Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu Lin View a PDF of the paper titled GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing, by Shih-Fang Chen and 3 other authors View PDF HTML (experimental) Abstract:Human perception for effective object tracking in a 2D video stream arises from the implicit use of prior 3D knowledge combined with semantic reasoning. In contrast, most generic object tracking (GOT) methods primarily rely on 2D features of the target and its surroundings while neglecting 3D geometric cues, which makes them susceptible to partial occlusion, distractors, and variations in geometry and appearance. To address this limitation, we introduce GOT-Edit, an online cross-modality model editing approach that integrates geometry-aware cues into a generic object tracker from a 2D video stream. Our approach leverages features from a pre-trained Visual Geometry Grounded Transformer to enable geometric cue inference from only a few 2D images. To tackle the challenge of seamlessly combining geometry and semantics, GOT-Edit performs online model editing with null-space constrained updates that incorporate geometric information while preserving semantic discrimination, yielding consistently better pe...