[2602.24159] RAViT: Resolution-Adaptive Vision Transformer
About this article
Abstract page for arXiv paper 2602.24159: RAViT: Resolution-Adaptive Vision Transformer
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.24159 (cs) [Submitted on 27 Feb 2026] Title:RAViT: Resolution-Adaptive Vision Transformer Authors:Martial Guidez, Stefan Duffner, Christophe Garcia View a PDF of the paper titled RAViT: Resolution-Adaptive Vision Transformer, by Martial Guidez and 2 other authors View PDF HTML (experimental) Abstract:Vision transformers have recently made a breakthrough in computer vision showing excellent performance in terms of precision for numerous applications. However, their computational cost is very high compared to alternative approaches such as Convolutional Neural Networks. To address this problem, we propose a novel framework for image classification called RAViT based on a multi-branch network that operates on several copies of the same image with different resolutions to reduce the computational cost while preserving the overall accuracy. Furthermore, our framework includes an early exit mechanism that makes our model adaptive and allows to choose the appropriate trade-off between accuracy and computational cost at run-time. For example in a two-branch architecture, the original image is first resized to reduce its resolution, then a prediction is performed on it using a first transformer and the resulting prediction is reused together with the original-size image to perform a final prediction on a second transformer with less computation than a classical Vision transformer architecture. The early-exit proc...