[2602.23777] Reasoning-Driven Multimodal LLM for Domain Generalization
About this article
Abstract page for arXiv paper 2602.23777: Reasoning-Driven Multimodal LLM for Domain Generalization
Computer Science > Artificial Intelligence arXiv:2602.23777 (cs) [Submitted on 27 Feb 2026] Title:Reasoning-Driven Multimodal LLM for Domain Generalization Authors:Zhipeng Xu, Zilong Wang, Xinyang Jiang, Dongsheng Li, De Cheng, Nannan Wang View a PDF of the paper titled Reasoning-Driven Multimodal LLM for Domain Generalization, by Zhipeng Xu and Zilong Wang and Xinyang Jiang and Dongsheng Li and De Cheng and Nannan Wang View PDF HTML (experimental) Abstract:This paper addresses the domain generalization (DG) problem in deep learning. While most DG methods focus on enforcing visual feature invariance, we leverage the reasoning capability of multimodal large language models (MLLMs) and explore the potential of constructing reasoning chains that derives image categories to achieve more robust predictions under domain shift. To this end, we systematically study the role of reasoning in DG using DomainBed-Reasoning, a newly constructed extension of DomainBed dataset, in which each sample is paired with class-relevant reasoning chains. Our analysis reveals two key challenges: (i) fine-tuning MLLMs with reasoning chains for classification is more challenging than direct label supervision, since the model must optimize complex reasoning sequences before label prediction; and (ii) mismatches in reasoning patterns between supervision signals and fine-tuned MLLMs lead to a trade-off between semantic richness (informative but harder to optimize) and optimization efficiency (easier to ...