[2602.15490] RPT-SR: Regional Prior attention Transformer for infrared image Super-Resolution
Summary
The paper presents RPT-SR, a novel transformer architecture designed for infrared image super-resolution, addressing inefficiencies in existing models by incorporating regional prior tokens to enhance performance in surveillance and autonomous driving scenarios.
Why It Matters
This research is significant as it tackles the limitations of current super-resolution models in infrared imaging, which is crucial for applications like surveillance and autonomous vehicles. By introducing a dual-token framework, it enhances the understanding of scene layouts, potentially leading to improved image quality and operational efficiency in critical real-world applications.
Key Takeaways
- RPT-SR utilizes a dual-token framework for improved image reconstruction.
- The model addresses inefficiencies in existing super-resolution techniques for infrared images.
- It demonstrates state-of-the-art performance across diverse infrared datasets.
- Incorporates spatial priors to enhance the attention mechanism.
- Applicable to both Long-Wave and Short-Wave infrared spectra.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.15490 (cs) [Submitted on 17 Feb 2026] Title:RPT-SR: Regional Prior attention Transformer for infrared image Super-Resolution Authors:Youngwan Jin, Incheol Park, Yagiz Nalcakan, Hyeongjin Ju, Sanghyeop Yeo, Shiho Kim View a PDF of the paper titled RPT-SR: Regional Prior attention Transformer for infrared image Super-Resolution, by Youngwan Jin and 5 other authors View PDF HTML (experimental) Abstract:General-purpose super-resolution models, particularly Vision Transformers, have achieved remarkable success but exhibit fundamental inefficiencies in common infrared imaging scenarios like surveillance and autonomous driving, which operate from fixed or nearly-static viewpoints. These models fail to exploit the strong, persistent spatial priors inherent in such scenes, leading to redundant learning and suboptimal performance. To address this, we propose the Regional Prior attention Transformer for infrared image Super-Resolution (RPT-SR), a novel architecture that explicitly encodes scene layout information into the attention mechanism. Our core contribution is a dual-token framework that fuses (1) learnable, regional prior tokens, which act as a persistent memory for the scene's global structure, with (2) local tokens that capture the frame-specific content of the current input. By utilizing these tokens into an attention, our model allows the priors to dynamically modulate the local reconstruction process. ...