[2602.13712] Fine-tuned Vision Language Model for Localization of Parasitic Eggs in Microscopic Images
Summary
This paper presents a fine-tuned Vision Language Model (VLM) designed for the localization of parasitic eggs in microscopic images, demonstrating superior performance compared to existing object detection methods.
Why It Matters
The research addresses the challenge of diagnosing soil-transmitted helminth infections, which affect many in tropical regions. By automating the localization of parasitic eggs, this model could enhance diagnostic accuracy and efficiency, ultimately improving public health outcomes.
Key Takeaways
- The proposed VLM shows an mIOU of 0.94, outperforming traditional object detection methods.
- Automating parasitic egg localization can reduce human error and increase diagnostic efficiency.
- This model has potential applications in regions with limited access to specialized diagnostic expertise.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.13712 (cs) [Submitted on 14 Feb 2026] Title:Fine-tuned Vision Language Model for Localization of Parasitic Eggs in Microscopic Images Authors:Chan Hao Sien, Hezerul Abdul Karim, Nouar AlDahoul View a PDF of the paper titled Fine-tuned Vision Language Model for Localization of Parasitic Eggs in Microscopic Images, by Chan Hao Sien and 2 other authors View PDF Abstract:Soil-transmitted helminth (STH) infections continuously affect a large proportion of the global population, particularly in tropical and sub-tropical regions, where access to specialized diagnostic expertise is limited. Although manual microscopic diagnosis of parasitic eggs remains the diagnostic gold standard, the approach can be labour-intensive, time-consuming, and prone to human error. This paper aims to utilize a vision language model (VLM) such as Microsoft Florence that was fine-tuned to localize all parasitic eggs within microscopic images. The preliminary results show that our localization VLM performs comparatively better than the other object detection methods, such as EfficientDet, with an mIOU of 0.94. This finding demonstrates the potential of the proposed VLM to serve as a core component of an automated framework, offering a scalable engineering solution for intelligent parasitological diagnosis. Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) Cite as: arXiv:2602.13712 [cs.CV] (or arXiv:2...