[2602.13662] LeafNet: A Large-Scale Dataset and Comprehensive Benchmark for Foundational Vision-Language Understanding of Plant Diseases
Summary
LeafNet introduces a large-scale dataset and benchmark for evaluating vision-language models in plant disease diagnosis, highlighting significant performance disparities among models.
Why It Matters
This research addresses a critical gap in agricultural AI applications by providing a comprehensive dataset and benchmarking framework, which can enhance the accuracy of plant disease diagnosis and promote advancements in multimodal AI technologies.
Key Takeaways
- LeafNet dataset includes 186,000 leaf images across 97 disease classes.
- Benchmarking reveals significant performance gaps in VLMs for plant pathology tasks.
- Multimodal models outperform traditional vision-only models in diagnostic accuracy.
- The study emphasizes the need for robust evaluation frameworks in AI-assisted agriculture.
- Fine-grained identification tasks show lower accuracy, indicating areas for improvement.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.13662 (cs) [Submitted on 14 Feb 2026] Title:LeafNet: A Large-Scale Dataset and Comprehensive Benchmark for Foundational Vision-Language Understanding of Plant Diseases Authors:Khang Nguyen Quoc, Phuong D. Dao, Luyl-Da Quach View a PDF of the paper titled LeafNet: A Large-Scale Dataset and Comprehensive Benchmark for Foundational Vision-Language Understanding of Plant Diseases, by Khang Nguyen Quoc and 1 other authors View PDF HTML (experimental) Abstract:Foundation models and vision-language pre-training have significantly advanced Vision-Language Models (VLMs), enabling multimodal processing of visual and linguistic data. However, their application in domain-specific agricultural tasks, such as plant pathology, remains limited due to the lack of large-scale, comprehensive multimodal image--text datasets and benchmarks. To address this gap, we introduce LeafNet, a comprehensive multimodal dataset, and LeafBench, a visual question-answering benchmark developed to systematically evaluate the capabilities of VLMs in understanding plant diseases. The dataset comprises 186,000 leaf digital images spanning 97 disease classes, paired with metadata, generating 13,950 question-answer pairs spanning six critical agricultural tasks. The questions assess various aspects of plant pathology understanding, including visual symptom recognition, taxonomic relationships, and diagnostic reasoning. Benchmarking 12 state-of-...