[2602.18525] Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity
Summary
This paper evaluates the effectiveness of generative metrics in predicting the performance of YOLO object detection models across various datasets and augmentation strategies.
Why It Matters
Understanding how generative metrics correlate with YOLO performance is crucial for improving synthetic data augmentation strategies in computer vision. This research provides insights that can enhance model training efficiency and accuracy, particularly in challenging detection scenarios.
Key Takeaways
- Generative metrics do not consistently predict YOLO performance across different datasets.
- Synthetic augmentation can significantly improve detection performance in complex scenarios.
- The correlation between metrics and performance is highly dependent on the augmentation level and dataset characteristics.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.18525 (cs) [Submitted on 20 Feb 2026] Title:Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity Authors:Vasile Marian, Yong-Bin Kang, Alexander Buddery View a PDF of the paper titled Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity, by Vasile Marian and 2 other authors View PDF HTML (experimental) Abstract:Synthetic images are increasingly used to augment object-detection training sets, but reliably evaluating a synthetic dataset before training remains difficult: standard global generative metrics (e.g., FID) often do not predict downstream detection mAP. We present a controlled evaluation of synthetic augmentation for YOLOv11 across three single-class detection regimes -- Traffic Signs (sparse/near-saturated), Cityscapes Pedestrian (dense/occlusion-heavy), and COCO PottedPlant (multi-instance/high-variability). We benchmark six GAN-, diffusion-, and hybrid-based generators over augmentation ratios from 10% to 150% of the real training split, and train YOLOv11 both from scratch and with COCO-pretrained initialization, evaluating on held-out real test splits (mAP@0.50:0.95). For each dataset-generator-augmentation configuration, we compute pre-training dataset metrics under a matched-size bootstrap protocol, including (i) global feature-space metrics in both Incepti...