[2603.02092] Adam Converges Without Any Modification On Update Rules
About this article
Abstract page for arXiv paper 2603.02092: Adam Converges Without Any Modification On Update Rules
Computer Science > Machine Learning arXiv:2603.02092 (cs) [Submitted on 2 Mar 2026] Title:Adam Converges Without Any Modification On Update Rules Authors:Yushun Zhang, Bingran Li, Congliang Chen, Zhi-Quan Luo, Ruoyu Sun View a PDF of the paper titled Adam Converges Without Any Modification On Update Rules, by Yushun Zhang and 4 other authors View PDF Abstract:Adam is the default algorithm for training neural networks, including large language models (LLMs). However, \citet{reddi2019convergence} provided an example that Adam diverges, raising concerns for its deployment in AI model training. We identify a key mismatch between the divergence example and practice: \citet{reddi2019convergence} pick the problem after picking the hyperparameters of Adam, i.e., $(\beta_1,\beta_2)$; while practical applications often fix the problem first and then tune $(\beta_1,\beta_2)$. In this work, we prove that Adam converges with proper problem-dependent hyperparameters. First, we prove that Adam converges when $\beta_2$ is large and $\beta_1 < \sqrt{\beta_2}$. Second, when $\beta_2$ is small, we point out a region of $(\beta_1,\beta_2)$ combinations where Adam can diverge to infinity. Our results indicate a phase transition for Adam from divergence to convergence when changing the $(\beta_1, \beta_2)$ combination. To our knowledge, this is the first phase transition in $(\beta_1,\beta_2)$ 2D-plane reported in the literature, providing rigorous theoretical guarantees for Adam optimizer. We ...