[2602.20796] Exploring the Impact of Parameter Update Magnitude on Forgetting and Generalization of Continual Learning
Summary
This article investigates how the magnitude of parameter updates affects forgetting and generalization in continual learning, proposing a hybrid update strategy that improves performance in deep neural networks.
Why It Matters
Understanding the impact of parameter update magnitude is crucial for developing efficient continual learning algorithms. This research addresses a gap in existing studies by providing theoretical insights and practical strategies that can enhance model performance, making it relevant for researchers and practitioners in machine learning.
Key Takeaways
- Parameter update magnitude significantly influences forgetting and generalization in continual learning.
- The study formalizes knowledge degradation as task-specific drift in parameter space.
- A hybrid parameter update strategy is proposed, adjusting update magnitude based on gradient directions.
- Experiments show that the hybrid approach outperforms standard training strategies.
- The findings unify frozen and initialized training paradigms within an optimization framework.
Computer Science > Machine Learning arXiv:2602.20796 (cs) [Submitted on 24 Feb 2026] Title:Exploring the Impact of Parameter Update Magnitude on Forgetting and Generalization of Continual Learning Authors:JinLi He, Liang Bai, Xian Yang View a PDF of the paper titled Exploring the Impact of Parameter Update Magnitude on Forgetting and Generalization of Continual Learning, by JinLi He and 2 other authors View PDF HTML (experimental) Abstract:The magnitude of parameter updates are considered a key factor in continual learning. However, most existing studies focus on designing diverse update strategies, while a theoretical understanding of the underlying mechanisms remains limited. Therefore, we characterize model's forgetting from the perspective of parameter update magnitude and formalize it as knowledge degradation induced by task-specific drift in the parameter space, which has not been fully captured in previous studies due to their assumption of a unified parameter space. By deriving the optimal parameter update magnitude that minimizes forgetting, we unify two representative update paradigms, frozen training and initialized training, within an optimization framework for constrained parameter updates. Our theoretical results further reveals that sequence tasks with small parameter distances exhibit better generalization and less forgetting under frozen training rather than initialized training. These theoretical insights inspire a novel hybrid parameter update strategy t...