[2507.04448] Transfer Learning in Infinite Width Feature Learning Networks

[2507.04448] Transfer Learning in Infinite Width Feature Learning Networks

arXiv - Machine Learning 4 min read Article

Summary

This article presents a theoretical framework for understanding transfer learning in infinitely wide neural networks, focusing on how pretraining can enhance generalization for target tasks.

Why It Matters

The research provides valuable insights into transfer learning, a critical area in machine learning that can significantly improve model performance. Understanding the dynamics of feature learning in neural networks can lead to better applications in various domains, including AI and data science.

Key Takeaways

  • The study quantifies the impact of pretraining on generalization in neural networks.
  • Two scenarios are analyzed: fine-tuning and jointly rich settings for feature learning.
  • The performance is influenced by data quantity, task alignment, and feature learning strength.
  • The theory is tested on both synthetic and real datasets, providing interpretable conclusions.
  • Adaptive kernels are identified as key components in understanding performance dynamics.

Computer Science > Machine Learning arXiv:2507.04448 (cs) [Submitted on 6 Jul 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:Transfer Learning in Infinite Width Feature Learning Networks Authors:Clarissa Lauditi, Blake Bordelon, Cengiz Pehlevan View a PDF of the paper titled Transfer Learning in Infinite Width Feature Learning Networks, by Clarissa Lauditi and 2 other authors View PDF HTML (experimental) Abstract:We develop a theory of transfer learning in infinitely wide neural networks under gradient flow that quantifies when pretraining on a source task improves generalization on a target task. We analyze both (i) fine-tuning, when the downstream predictor is trained on top of source-induced features and (ii) a jointly rich setting, where both pretraining and downstream tasks can operate in a feature learning regime, but the downstream model is initialized with the features obtained after pre-training. In this setup, the summary statistics of randomly initialized networks after a rich pre-training are adaptive kernels which depend on both source data and labels. For (i), we analyze the performance of a readout for different pretraining data regimes. For (ii), the summary statistics after learning the target task are still adaptive kernels with features from both source and target tasks. We test our theory on linear and polynomial regression tasks as well as real datasets. Our theory allows interpretable conclusions on performance, which depend on the amou...

Related Articles

Llms

Study: LLMs Able to De-Anonymize User Accounts on Reddit, Hacker News & Other "Pseudonymous" Platforms; Report Co-Author Expands, Advises

Advice from the study's co-author: "Be aware that it’s not any single post that identifies you, but the combination of small details acro...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] Best websites for pytorch/numpy interviews

Hello, I’m at the last year of my PHD and I’m starting to prepare interviews. I’m mainly aiming at applied scientist/research engineer or...

Reddit - Machine Learning · 1 min ·
Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min ·
Machine Learning

Can AI truly be creative?

AI has no imagination. “Creativity is the ability to generate novel and valuable ideas or works through the exercise of imagination” http...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime