[2510.09658] Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models

[2510.09658] Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models

arXiv - Machine Learning 4 min read Article

Summary

This paper presents Gradient-Sign Masking, a method for transferring task vectors across pre-trained models without additional fine-tuning, enhancing performance in machine learning tasks.

Why It Matters

As foundation models evolve, practitioners often need to fine-tune models for similar tasks. This research offers a method to reuse task vectors effectively, reducing the need for repeated fine-tuning and improving efficiency in model adaptation, which is crucial for advancing AI applications.

Key Takeaways

  • Gradient-Sign Masking allows for effective transfer of task vectors across different pre-trained models.
  • The method requires no additional fine-tuning, relying instead on gradient computations.
  • Empirical results show significant performance improvements on vision and language benchmarks.
  • The approach ensures first-order descent, providing a theoretical guarantee of effectiveness.
  • Transporting task vectors enhances multi-task and multi-source model merging capabilities.

Computer Science > Machine Learning arXiv:2510.09658 (cs) [Submitted on 7 Oct 2025 (v1), last revised 20 Feb 2026 (this version, v3)] Title:Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models Authors:Filippo Rinaldi, Aniello Panariello, Giacomo Salici, Fengyuan Liu, Marco Ciccone, Angelo Porrello, Simone Calderara View a PDF of the paper titled Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models, by Filippo Rinaldi and 6 other authors View PDF HTML (experimental) Abstract:When a new release of a foundation model is published, practitioners typically need to repeat fine-tuning, even if the same task was already tackled in the previous version. A promising alternative is to reuse the parameter changes (i.e., task vectors) that capture how a model adapts to a specific task. However, these vectors often fail to transfer across different pre-trained models because their parameter spaces are misaligned. In this work, we show that successful transfer depends strongly on the gradient-sign structure of the new model. Based on this insight, we propose GradFix, which approximates the ideal sign structure and leverages it to transfer knowledge using only a handful of labeled samples. Notably, this requires no additional fine-tuning: we only compute a few target-model gradients without parameter updates and mask the source task vector accordingly. This yields an update that is locally aligned with the target loss landscape, effectively rebasi...

Related Articles

How to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and others | TechCrunch
Llms

How to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and others | TechCrunch

Learn how to use Spotify, Canva, Figma, Expedia, and other apps directly in ChatGPT.

TechCrunch - AI · 10 min ·
Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto
Llms

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto

AI Tools & Products · 7 min ·
Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains
Llms

Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains

AI Tools & Products · 5 min ·
AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface
Llms

AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface

AI Tools & Products · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime