[2602.15539] Dynamic Training-Free Fusion of Subject and Style LoRAs
Summary
The paper presents a novel dynamic training-free fusion framework for combining subject and style LoRAs in generative models, enhancing coherence in output without retraining.
Why It Matters
This research addresses limitations in existing LoRA fusion methods by introducing a dynamic approach that adapts during the generation process, improving the quality of synthesized outputs. It has implications for advancements in computer vision and generative AI, particularly in applications requiring nuanced subject and style integration.
Key Takeaways
- Introduces a dynamic framework for LoRA fusion that operates during generation.
- Utilizes KL divergence for adaptive weight selection in feature fusion.
- Implements gradient-based corrections for enhanced semantic guidance.
- Demonstrates superior performance compared to traditional static methods.
- Applicable across various subject-style combinations in generative tasks.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.15539 (cs) [Submitted on 17 Feb 2026] Title:Dynamic Training-Free Fusion of Subject and Style LoRAs Authors:Qinglong Cao, Yuntian Chen, Chao Ma, Xiaokang Yang View a PDF of the paper titled Dynamic Training-Free Fusion of Subject and Style LoRAs, by Qinglong Cao and 3 other authors View PDF HTML (experimental) Abstract:Recent studies have explored the combination of multiple LoRAs to simultaneously generate user-specified subjects and styles. However, most existing approaches fuse LoRA weights using static statistical heuristics that deviate from LoRA's original purpose of learning adaptive feature adjustments and ignore the randomness of sampled inputs. To address this, we propose a dynamic training-free fusion framework that operates throughout the generation process. During the forward pass, at each LoRA-applied layer, we dynamically compute the KL divergence between the base model's original features and those produced by subject and style LoRAs, respectively, and adaptively select the most appropriate weights for fusion. In the reverse denoising stage, we further refine the generation trajectory by dynamically applying gradient-based corrections derived from objective metrics such as CLIP and DINO scores, providing continuous semantic and stylistic guidance. By integrating these two complementary mechanisms-feature-level selection and metric-guided latent adjustment-across the entire diffusion timel...