[2603.17729] SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition
About this article
Abstract page for arXiv paper 2603.17729: SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.17729 (cs) [Submitted on 18 Mar 2026 (v1), last revised 24 Mar 2026 (this version, v2)] Title:SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition Authors:Jingxiao Yang, DaLin He, Miao Pan, Ge Su, Wenqi Zhang, Yifeng Hu, Tangwei Li, Yuke Li, Xuhong Zhang View a PDF of the paper titled SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition, by Jingxiao Yang and 8 other authors View PDF HTML (experimental) Abstract:Recent advances in Large Vision-Language Models (LVLMs) have enabled training-free Fine-Grained Visual Recognition (FGVR). However, effectively exploiting LVLMs for FGVR remains challenging due to the inherent visual ambiguity of subordinate-level categories. Existing methods predominantly adopt either retrieval-oriented or reasoning-oriented paradigms to tackle this challenge, but both are constrained by two fundamental limitations:(1) They apply the same inference pipeline to all samples without accounting for uneven recognition difficulty, thereby leading to suboptimal accuracy and efficiency; (2) The lack of mechanisms to consolidate and reuse error-specific experience causes repeated failures on similar challenging cases. To address these limitations, we propose SARE, a Sample-wise Adaptive textbfREasoning framework for training-free FGVR. Specifically, SARE adopts a cascaded design that combines fast candidate retrieval with...