[2510.09658] Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
Summary
This paper presents Gradient-Sign Masking, a method for transferring task vectors across pre-trained models without additional fine-tuning, enhancing performance in machine learning tasks.
Why It Matters
As foundation models evolve, practitioners often need to fine-tune models for similar tasks. This research offers a method to reuse task vectors effectively, reducing the need for repeated fine-tuning and improving efficiency in model adaptation, which is crucial for advancing AI applications.
Key Takeaways
- Gradient-Sign Masking allows for effective transfer of task vectors across different pre-trained models.
- The method requires no additional fine-tuning, relying instead on gradient computations.
- Empirical results show significant performance improvements on vision and language benchmarks.
- The approach ensures first-order descent, providing a theoretical guarantee of effectiveness.
- Transporting task vectors enhances multi-task and multi-source model merging capabilities.
Computer Science > Machine Learning arXiv:2510.09658 (cs) [Submitted on 7 Oct 2025 (v1), last revised 20 Feb 2026 (this version, v3)] Title:Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models Authors:Filippo Rinaldi, Aniello Panariello, Giacomo Salici, Fengyuan Liu, Marco Ciccone, Angelo Porrello, Simone Calderara View a PDF of the paper titled Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models, by Filippo Rinaldi and 6 other authors View PDF HTML (experimental) Abstract:When a new release of a foundation model is published, practitioners typically need to repeat fine-tuning, even if the same task was already tackled in the previous version. A promising alternative is to reuse the parameter changes (i.e., task vectors) that capture how a model adapts to a specific task. However, these vectors often fail to transfer across different pre-trained models because their parameter spaces are misaligned. In this work, we show that successful transfer depends strongly on the gradient-sign structure of the new model. Based on this insight, we propose GradFix, which approximates the ideal sign structure and leverages it to transfer knowledge using only a handful of labeled samples. Notably, this requires no additional fine-tuning: we only compute a few target-model gradients without parameter updates and mask the source task vector accordingly. This yields an update that is locally aligned with the target loss landscape, effectively rebasi...