[2603.20116] Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning
About this article
Abstract page for arXiv paper 2603.20116: Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.20116 (cs) [Submitted on 20 Mar 2026] Title:Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning Authors:Jiajie Li, Chenhui Xu, Meihuan Liu, Jinjun Xiong View a PDF of the paper titled Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning, by Jiajie Li and 3 other authors View PDF HTML (experimental) Abstract:Conventional fine-tuning on domain-specific datasets can inadvertently alter a model's pretrained multimodal priors, leading to reduced generalization. To address this, we propose Chain-of-Adaptation (CoA), an adaptation framework designed to integrate domain knowledge while maintaining the model's inherent reasoning and perceptual capabilities. CoA introduces a structured reasoning format that enhances domain alignment without sacrificing general multimodal competence by reinforcement learning. Experiments on standard surgical benchmarks, under both in-distribution and out-of-distribution settings, demonstrate that CoA achieves higher accuracy, stronger generalization, and more stable behavior than supervised fine-tuning. Furthermore, ablation studies confirm that CoA effectively preserves the model's core visual-language abilities, providing a reliable pathway for domain specialization in VLMs. Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI) Cite as: arXiv:2603.20116 [cs.CV] (or arXiv:2603.2...