[2602.15154] Loss Knows Best: Detecting Annotation Errors in Videos via Loss Trajectories
Summary
The paper presents a novel method for detecting annotation errors in video datasets by analyzing loss trajectories, enhancing model training reliability in computer vision tasks.
Why It Matters
High-quality video datasets are crucial for machine learning applications, yet they often contain annotation errors that can degrade model performance. This research provides a model-agnostic approach to identify such errors, which can significantly improve dataset quality and training outcomes in various video-based applications.
Key Takeaways
- Proposes a method to detect annotation errors using Cumulative Sample Loss (CSL).
- Identifies mislabeling and temporal disordering in video datasets effectively.
- Does not require ground truth for annotation errors, making it widely applicable.
- Demonstrated strong performance on datasets like EgoPER and Cholec80.
- Enhances the reliability of training models in video-based machine learning.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.15154 (cs) [Submitted on 16 Feb 2026] Title:Loss Knows Best: Detecting Annotation Errors in Videos via Loss Trajectories Authors:Praditha Alwis, Soumyadeep Chandra, Deepak Ravikumar, Kaushik Roy View a PDF of the paper titled Loss Knows Best: Detecting Annotation Errors in Videos via Loss Trajectories, by Praditha Alwis and 3 other authors View PDF HTML (experimental) Abstract:High-quality video datasets are foundational for training robust models in tasks like action recognition, phase detection, and event segmentation. However, many real-world video datasets suffer from annotation errors such as *mislabeling*, where segments are assigned incorrect class labels, and *disordering*, where the temporal sequence does not follow the correct progression. These errors are particularly harmful in phase-annotated tasks, where temporal consistency is critical. We propose a novel, model-agnostic method for detecting annotation errors by analyzing the Cumulative Sample Loss (CSL)--defined as the average loss a frame incurs when passing through model checkpoints saved across training epochs. This per-frame loss trajectory acts as a dynamic fingerprint of frame-level learnability. Mislabeled or disordered frames tend to show consistently high or irregular loss patterns, as they remain difficult for the model to learn throughout training, while correctly labeled frames typically converge to low loss early. To compute ...