[2602.20062] A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning

[2602.20062] A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning

arXiv - Machine Learning 4 min read Article

Summary

This paper presents a theoretical framework explaining how pretraining influences inductive bias during fine-tuning in machine learning, particularly in neural networks.

Why It Matters

Understanding the relationship between pretraining and fine-tuning is crucial for improving model generalization in machine learning. This research provides insights into how initialization choices affect feature learning, which can enhance performance across various tasks.

Key Takeaways

  • Different initialization choices lead to distinct fine-tuning regimes.
  • Smaller initialization scales in earlier layers enhance feature reuse and refinement.
  • The study derives exact expressions for generalization error based on initialization parameters.
  • Empirical results confirm the theoretical findings in nonlinear networks.
  • The interaction between data and initialization is pivotal for fine-tuning success.

Computer Science > Machine Learning arXiv:2602.20062 (cs) [Submitted on 23 Feb 2026] Title:A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning Authors:Nicolas Anguita, Francesco Locatello, Andrew M. Saxe, Marco Mondelli, Flavia Mancini, Samuel Lippl, Clementine Domine View a PDF of the paper titled A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning, by Nicolas Anguita and 6 other authors View PDF HTML (experimental) Abstract:Pretraining and fine-tuning are central stages in modern machine learning systems. In practice, feature learning plays an important role across both stages: deep neural networks learn a broad range of useful features during pretraining and further refine those features during fine-tuning. However, an end-to-end theoretical understanding of how choices of initialization impact the ability to reuse and refine features during fine-tuning has remained elusive. Here we develop an analytical theory of the pretraining-fine-tuning pipeline in diagonal linear networks, deriving exact expressions for the generalization error as a function of initialization parameters and task statistics. We find that different initialization choices place the network into four distinct fine-tuning regimes that are distinguished by their ability to support feature learning and reuse, and therefore by the task statistics for which they are beneficial. In particular, a smaller initialization scale in earlier layers enables the network to both reuse and re...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

[D] Budget Machine Learning Hardware

Looking to get into machine learning and found this video on a piece of hardware for less than £500. Is it really possible to teach auton...

Reddit - Machine Learning · 1 min ·
Machine Learning

Your prompts aren’t the problem — something else is

I keep seeing people focus heavily on prompt optimization. But in practice, a lot of failures I’ve observed don’t come from the prompt it...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[R], 31 MILLIONS High frequency data, Light GBM worked perfectly

We just published a paper on predicting adverse selection in high-frequency crypto markets using LightGBM, and I wanted to share it here ...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime