[2602.18585] BloomNet: Exploring Single vs. Multiple Object Annotation for Flower Recognition Using YOLO Variants

[2602.18585] BloomNet: Exploring Single vs. Multiple Object Annotation for Flower Recognition Using YOLO Variants

arXiv - AI 4 min read Article

Summary

The paper explores the effectiveness of single versus multiple object annotation for flower recognition using various YOLO models, presenting a new dataset and benchmarking results.

Why It Matters

This research is significant for advancements in automated agriculture, particularly in improving flower detection methods that can enhance crop monitoring and yield estimation. The findings contribute to the understanding of how annotation techniques and model architectures impact detection performance in different scenarios.

Key Takeaways

  • Introduces the FloralSix dataset for flower recognition.
  • Benchmarks YOLO models under single and multiple object annotation regimes.
  • YOLOv8m shows superior performance in sparse scenarios, while YOLOv12n excels in dense environments.
  • The choice of optimizer (SGD) consistently yields better results across models.
  • Findings support applications in non-destructive crop analysis and robotic pollination.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.18585 (cs) [Submitted on 20 Feb 2026] Title:BloomNet: Exploring Single vs. Multiple Object Annotation for Flower Recognition Using YOLO Variants Authors:Safwat Nusrat, Prithwiraj Bhattacharjee View a PDF of the paper titled BloomNet: Exploring Single vs. Multiple Object Annotation for Flower Recognition Using YOLO Variants, by Safwat Nusrat and 1 other authors View PDF HTML (experimental) Abstract:Precise localization and recognition of flowers are crucial for advancing automated agriculture, particularly in plant phenotyping, crop estimation, and yield monitoring. This paper benchmarks several YOLO architectures such as YOLOv5s, YOLOv8n/s/m, and YOLOv12n for flower object detection under two annotation regimes: single-image single-bounding box (SISBB) and single-image multiple-bounding box (SIMBB). The FloralSix dataset, comprising 2,816 high-resolution photos of six different flower species, is also introduced. It is annotated for both dense (clustered) and sparse (isolated) scenarios. The models were evaluated using Precision, Recall, and Mean Average Precision (mAP) at IoU thresholds of 0.5 (mAP@0.5) and 0.5-0.95 (mAP@0.5:0.95). In SISBB, YOLOv8m (SGD) achieved the best results with Precision 0.956, Recall 0.951, mAP@0.5 0.978, and mAP@0.5:0.95 0.865, illustrating strong accuracy in detecting isolated flowers. With mAP@0.5 0.934 and mAP@0.5:0.95 0.752, YOLOv12n (SGD) outperformed the more complicated...

Related Articles

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·
[2603.26292] findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding
Llms

[2603.26292] findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding

Abstract page for arXiv paper 2603.26292: findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding

arXiv - AI · 3 min ·
More in Computer Vision: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime