[2603.11804] OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs
About this article
Abstract page for arXiv paper 2603.11804: OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.11804 (cs) [Submitted on 12 Mar 2026 (v1), last revised 25 Mar 2026 (this version, v2)] Title:OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs Authors:Stefan Maria Ailuro, Mario Markov, Mohammad Mahdi, Delyan Boychev, Luc Van Gool, Danda Pani Paudel (INSAIT, Sofia University "St. Kliment Ohridski") View a PDF of the paper titled OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs, by Stefan Maria Ailuro and 6 other authors View PDF HTML (experimental) Abstract:Vision-Language Models (VLMs) adapted to remote sensing rely heavily on domain-specific image-text supervision, yet high-quality annotations for satellite and aerial imagery remain scarce and expensive to produce. Prevailing pseudo-labeling pipelines address this gap by distilling knowledge from large frontier models, but this dependence on large teachers is costly, limits scalability, and caps achievable performance at the ceiling of the teacher. We propose OSMDA: a self-contained domain adaptation framework that eliminates this dependency. Our key insight is that a capable base VLM can serve as its own annotation engine: by pairing aerial images with rendered OpenStreetMap (OSM) tiles, we leverage optical character recognition and chart comprehension capabilities of the model to generate captions enriched by OSM's vast auxiliary metadata. The model is then fine-tuned on the resulting corpus with satellite imag...