Machine Learning Ai Safety Data Science Ai Agents

[2602.20062] A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning

arXiv - Machine Learning February 24, 2026 4 min read Article

Summary

This paper presents a theoretical framework explaining how pretraining influences inductive bias during fine-tuning in machine learning, particularly in neural networks.

Why It Matters

Understanding the relationship between pretraining and fine-tuning is crucial for improving model generalization in machine learning. This research provides insights into how initialization choices affect feature learning, which can enhance performance across various tasks.

Key Takeaways

Different initialization choices lead to distinct fine-tuning regimes.
Smaller initialization scales in earlier layers enhance feature reuse and refinement.
The study derives exact expressions for generalization error based on initialization parameters.
Empirical results confirm the theoretical findings in nonlinear networks.
The interaction between data and initialization is pivotal for fine-tuning success.

Computer Science > Machine Learning arXiv:2602.20062 (cs) [Submitted on 23 Feb 2026] Title:A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning Authors:Nicolas Anguita, Francesco Locatello, Andrew M. Saxe, Marco Mondelli, Flavia Mancini, Samuel Lippl, Clementine Domine View a PDF of the paper titled A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning, by Nicolas Anguita and 6 other authors View PDF HTML (experimental) Abstract:Pretraining and fine-tuning are central stages in modern machine learning systems. In practice, feature learning plays an important role across both stages: deep neural networks learn a broad range of useful features during pretraining and further refine those features during fine-tuning. However, an end-to-end theoretical understanding of how choices of initialization impact the ability to reuse and refine features during fine-tuning has remained elusive. Here we develop an analytical theory of the pretraining-fine-tuning pipeline in diagonal linear networks, deriving exact expressions for the generalization error as a function of initialization parameters and task statistics. We find that different initialization choices place the network into four distinct fine-tuning regimes that are distinguished by their ability to support feature learning and reuse, and therefore by the task statistics for which they are beneficial. In particular, a smaller initialization scale in earlier layers enables the network to both reuse and re...

Read Original Article

[2602.20062] A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning

Summary

Why It Matters

Key Takeaways

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence

[D] Budget Machine Learning Hardware

Your prompts aren’t the problem — something else is

[R], 31 MILLIONS High frequency data, Light GBM worked perfectly

No comments

Stay updated with AI News