[2603.24518] TuneShift-KD: Knowledge Distillation and Transfer for Fine-tuned Models
About this article
Abstract page for arXiv paper 2603.24518: TuneShift-KD: Knowledge Distillation and Transfer for Fine-tuned Models
Computer Science > Machine Learning arXiv:2603.24518 (cs) [Submitted on 25 Mar 2026] Title:TuneShift-KD: Knowledge Distillation and Transfer for Fine-tuned Models Authors:Yushi Guan, Jeanine Ohene-Agyei, Daniel Kwan, Jean Sebastien Dandurand, Yifei Zhang, Nandita Vijaykumar View a PDF of the paper titled TuneShift-KD: Knowledge Distillation and Transfer for Fine-tuned Models, by Yushi Guan and 5 other authors View PDF Abstract:To embed domain-specific or specialized knowledge into pre-trained foundation models, fine-tuning using techniques such as parameter efficient fine-tuning (e.g. LoRA) is a common practice. However, as new LLM architectures and pre-trained models emerge, transferring this specialized knowledge to newer models becomes an important task. In many scenarios, the original specialized data may be unavailable due to privacy or commercial restrictions, necessitating distillation and transfer of this specialized knowledge from the fine-tuned base model to a different pre-trained model. We present TuneShift-KD, a novel approach that automatically distills specialized knowledge from a fine-tuned model to a target model using only a few examples representative of the specialized information. Our key insight is that specialized knowledge can be identified through perplexity differences between base and fine-tuned models: prompts where the fine-tuned model responds confidently (low perplexity), but the base model struggles (high perplexity), indicate queries corres...