[2510.12764] AnyUp: Universal Feature Upsampling
Summary
The paper presents AnyUp, a novel method for universal feature upsampling applicable to various vision features at any resolution, enhancing upsampling quality without the need for encoder-specific training.
Why It Matters
AnyUp addresses the limitations of existing learning-based upsamplers that require retraining for different feature extractors. By providing a feature-agnostic upsampling solution, it improves efficiency and quality in computer vision tasks, making it relevant for researchers and practitioners in the field.
Key Takeaways
- AnyUp allows for feature upsampling without encoder-specific training.
- It sets a new state of the art for upsampled features in computer vision.
- The method generalizes across different feature types while preserving semantics.
- AnyUp is efficient and easy to apply to various downstream tasks.
- This innovation can significantly enhance the performance of vision-related applications.
Computer Science > Computer Vision and Pattern Recognition arXiv:2510.12764 (cs) [Submitted on 14 Oct 2025 (v1), last revised 16 Feb 2026 (this version, v2)] Title:AnyUp: Universal Feature Upsampling Authors:Thomas Wimmer, Prune Truong, Marie-Julie Rakotosaona, Michael Oechsle, Federico Tombari, Bernt Schiele, Jan Eric Lenssen View a PDF of the paper titled AnyUp: Universal Feature Upsampling, by Thomas Wimmer and 6 other authors View PDF HTML (experimental) Abstract:We introduce AnyUp, a method for feature upsampling that can be applied to any vision feature at any resolution, without encoder-specific training. Existing learning-based upsamplers for features like DINO or CLIP need to be re-trained for every feature extractor and thus do not generalize to different feature types at inference time. In this work, we propose an inference-time feature-agnostic upsampling architecture to alleviate this limitation and improve upsampling quality. In our experiments, AnyUp sets a new state of the art for upsampled features, generalizes to different feature types, and preserves feature semantics while being efficient and easy to apply to a wide range of downstream tasks. Comments: Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) Cite as: arXiv:2510.12764 [cs.CV] (or arXiv:2510.12764v2 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2510.12764 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Thomas Wimmer [vie...