[2602.14029] Why Self-Training Helps and Hurts: Denoising vs. Signal Forgetting

[2602.14029] Why Self-Training Helps and Hurts: Denoising vs. Signal Forgetting

arXiv - Machine Learning 4 min read Article

Summary

This paper investigates the dual effects of iterative self-training in machine learning, focusing on the balance between denoising and signal forgetting in overparameterized linear regression models.

Why It Matters

Understanding the dynamics of self-training is crucial for improving machine learning models, especially in high-dimensional settings. This research provides insights into optimizing model training processes, which can enhance predictive accuracy and efficiency in various applications.

Key Takeaways

  • Self-training can lead to both denoising and signal forgetting, impacting model performance.
  • The study introduces a U-shaped test-risk curve and optimal early-stopping criteria.
  • Iterative self-training acts as a spectral filter, enhancing strong features while suppressing weaker ones.
  • A new generalized cross-validation criterion is proposed for data-driven stopping time selection.
  • Experiments validate the theoretical findings, demonstrating practical implications for model training.

Statistics > Machine Learning arXiv:2602.14029 (stat) [Submitted on 15 Feb 2026] Title:Why Self-Training Helps and Hurts: Denoising vs. Signal Forgetting Authors:Mingqi Wu, Archer Y. Yang, Qiang Sun View a PDF of the paper titled Why Self-Training Helps and Hurts: Denoising vs. Signal Forgetting, by Mingqi Wu and 2 other authors View PDF HTML (experimental) Abstract:Iterative self-training (self-distillation) repeatedly refits a model on pseudo-labels generated by its own predictions. We study this procedure in overparameterized linear regression: an initial estimator is trained on noisy labels, and each subsequent iterate is trained on fresh covariates with noiseless pseudo-labels from the previous model. In the high-dimensional regime, we derive deterministic-equivalent recursions for the prediction risk and effective noise across iterations, and prove that the empirical quantities concentrate sharply around these limits. The recursion separates two competing forces: a systematic component that grows with iteration due to progressive signal forgetting, and a stochastic component that decays due to denoising via repeated data-dependent projections. Their interaction yields a $U$-shaped test-risk curve and an optimal early-stopping time. In spiked covariance models, iteration further acts as an iteration-dependent spectral filter that preserves strong eigendirections while suppressing weaker ones, inducing an implicit form of soft feature selection distinct from ridge regr...

Related Articles

Machine Learning

Danger Words - Where Words Are Weapons

Every profession has its danger words - small words that carry hidden judgements while pretending to be neutral. I learned to hear them w...

Reddit - Artificial Intelligence · 1 min ·
The Download: an exclusive Jeff VanderMeer story and AI models too scary to release | MIT Technology Review
Machine Learning

The Download: an exclusive Jeff VanderMeer story and AI models too scary to release | MIT Technology Review

OpenAI has joined Anthropic in restricting an AI model's release over security fears.

MIT Technology Review - AI · 4 min ·
Llms

What's your "When Language Model AI can do X, I'll be impressed"?

I have two at the top of my mind: When it can read musical notes. I will be mildly impressed when I can paste in a picture of musical not...

Reddit - Artificial Intelligence · 1 min ·
Meta’s New AI Asked for My Raw Health Data—and Gave Me Terrible Advice | WIRED
Machine Learning

Meta’s New AI Asked for My Raw Health Data—and Gave Me Terrible Advice | WIRED

Meta’s Muse Spark model offers to analyze users’ health data, including lab results. Beyond the obvious privacy risks, it’s not a capable...

Wired - AI · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime