[2601.12357] SimpleMatch: A Simple and Strong Baseline for Semantic Correspondence
Summary
The paper presents SimpleMatch, a novel framework for semantic correspondence that enhances performance at lower resolutions while reducing computational overhead.
Why It Matters
As semantic correspondence tasks increasingly rely on high-resolution images, SimpleMatch addresses the limitations of current methods by providing an efficient alternative that maintains performance without the associated computational costs. This research is significant for advancing computer vision applications where resource constraints are a concern.
Key Takeaways
- SimpleMatch offers a lightweight upsample decoder for better performance at lower resolutions.
- The framework reduces training memory usage by 51% through sparse matching and window-based localization.
- Achieves superior performance on the SPair-71k benchmark with 84.1% PCK@0.1.
- Addresses the issue of irreversible fusion of keypoint features in downsampled images.
- Provides a practical baseline for future research in semantic correspondence.
Computer Science > Computer Vision and Pattern Recognition arXiv:2601.12357 (cs) [Submitted on 18 Jan 2026 (v1), last revised 13 Feb 2026 (this version, v2)] Title:SimpleMatch: A Simple and Strong Baseline for Semantic Correspondence Authors:Hailing Jin, Huiying Li View a PDF of the paper titled SimpleMatch: A Simple and Strong Baseline for Semantic Correspondence, by Hailing Jin and 1 other authors View PDF HTML (experimental) Abstract:Recent advances in semantic correspondence have been largely driven by the use of pre-trained large-scale models. However, a limitation of these approaches is their dependence on high-resolution input images to achieve optimal performance, which results in considerable computational overhead. In this work, we address a fundamental limitation in current methods: the irreversible fusion of adjacent keypoint features caused by deep downsampling operations. This issue is triggered when semantically distinct keypoints fall within the same downsampled receptive field (e.g., 16x16 patches). To address this issue, we present SimpleMatch, a simple yet effective framework for semantic correspondence that delivers strong performance even at low resolutions. We propose a lightweight upsample decoder that progressively recovers spatial detail by upsampling deep features to 1/4 resolution, and a multi-scale supervised loss that ensures the upsampled features retain discriminative features across different spatial scales. In addition, we introduce sparse ...