[2602.13112] AdaGrad-Diff: A New Version of the Adaptive Gradient Algorithm

[2602.13112] AdaGrad-Diff: A New Version of the Adaptive Gradient Algorithm

arXiv - Machine Learning 3 min read Article

Summary

The paper introduces AdaGrad-Diff, an adaptive gradient algorithm that improves upon the traditional AdaGrad by adjusting the stepsize based on the cumulative squared norms of gradient differences, enhancing robustness in optimization tasks.

Why It Matters

This research addresses the common challenge of sensitivity in gradient-based optimization methods, particularly in machine learning. By proposing a new adaptive method, it offers potential improvements in training models, which can lead to better performance in various applications.

Key Takeaways

  • AdaGrad-Diff adapts stepsize based on gradient differences, not just norms.
  • This method reduces unnecessary stepsize damping during stable iterations.
  • Numerical experiments show AdaGrad-Diff's robustness compared to traditional AdaGrad.
  • The approach can enhance performance in machine learning tasks.
  • It addresses the need for less manual tuning in gradient methods.

Statistics > Machine Learning arXiv:2602.13112 (stat) [Submitted on 13 Feb 2026] Title:AdaGrad-Diff: A New Version of the Adaptive Gradient Algorithm Authors:Matia Bojovic, Saverio Salzo, Massimiliano Pontil View a PDF of the paper titled AdaGrad-Diff: A New Version of the Adaptive Gradient Algorithm, by Matia Bojovic and 2 other authors View PDF HTML (experimental) Abstract:Vanilla gradient methods are often highly sensitive to the choice of stepsize, which typically requires manual tuning. Adaptive methods alleviate this issue and have therefore become widely used. Among them, AdaGrad has been particularly influential. In this paper, we propose an AdaGrad-style adaptive method in which the adaptation is driven by the cumulative squared norms of successive gradient differences rather than gradient norms themselves. The key idea is that when gradients vary little across iterations, the stepsize is not unnecessarily reduced, while significant gradient fluctuations, reflecting curvature or instability, lead to automatic stepsize damping. Numerical experiments demonstrate that the proposed method is more robust than AdaGrad in several practically relevant settings. Comments: Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC) Cite as: arXiv:2602.13112 [stat.ML]   (or arXiv:2602.13112v1 [stat.ML] for this version)   https://doi.org/10.48550/arXiv.2602.13112 Focus to learn more arXiv-issued DOI via DataCite (pending registration) S...

Related Articles

Machine Learning

Artificial intelligence - Machine Learning, Robotics, Algorithms

AI Events ·
Machine Learning

Fed Chair Jerome Powell, Treasury's Bessent and top bank CEOs met over Anthropic's Mythos model

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
CoreWeave strikes a deal to power Anthropic's Claude AI models — and the stock surges 12%
Llms

CoreWeave strikes a deal to power Anthropic's Claude AI models — and the stock surges 12%

AI Tools & Products · 3 min ·
New AI model sparks alarm as governments brace for AI-driven cyberattacks
Machine Learning

New AI model sparks alarm as governments brace for AI-driven cyberattacks

AI Tools & Products · 6 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime