[2411.16196] Learn from Foundation Model: Fruit Detection Model without Manual Annotation
About this article
Abstract page for arXiv paper 2411.16196: Learn from Foundation Model: Fruit Detection Model without Manual Annotation
Computer Science > Computer Vision and Pattern Recognition arXiv:2411.16196 (cs) [Submitted on 25 Nov 2024 (v1), last revised 22 Mar 2026 (this version, v2)] Title:Learn from Foundation Model: Fruit Detection Model without Manual Annotation Authors:Yanan Wang, Zhenghao Fei, Ruichen Li, Yibin Ying View a PDF of the paper titled Learn from Foundation Model: Fruit Detection Model without Manual Annotation, by Yanan Wang and Zhenghao Fei and Ruichen Li and Yibin Ying View PDF HTML (experimental) Abstract:Recent breakthroughs in large foundation models have enabled the possibility of transferring knowledge pre-trained on vast datasets to domains with limited data availability. Agriculture is one of the domains that lacks sufficient data. This study proposes a framework to train effective, domain-specific, small models from foundation models without manual annotation. Our approach begins with SDM (Segmentation-Description-Matching), a stage that leverages two foundation models: SAM2 (Segment Anything in Images and Videos) for segmentation and OpenCLIP (Open Contrastive Language-Image Pretraining) for zero-shot open-vocabulary classification. In the second stage, a novel knowledge distillation mechanism is utilized to distill compact, edge-deployable models from SDM, enhancing both inference speed and perception accuracy. The complete method, termed SDM-D (Segmentation-Description-Matching-Distilling), demonstrates strong performance across various fruit detection tasks object de...