[2602.13112] AdaGrad-Diff: A New Version of the Adaptive Gradient Algorithm

arXiv - Machine Learning February 16, 2026 3 min read Article

Summary

The paper introduces AdaGrad-Diff, an adaptive gradient algorithm that improves upon the traditional AdaGrad by adjusting the stepsize based on the cumulative squared norms of gradient differences, enhancing robustness in optimization tasks.

Why It Matters

This research addresses the common challenge of sensitivity in gradient-based optimization methods, particularly in machine learning. By proposing a new adaptive method, it offers potential improvements in training models, which can lead to better performance in various applications.

Key Takeaways

AdaGrad-Diff adapts stepsize based on gradient differences, not just norms.
This method reduces unnecessary stepsize damping during stable iterations.
Numerical experiments show AdaGrad-Diff's robustness compared to traditional AdaGrad.
The approach can enhance performance in machine learning tasks.
It addresses the need for less manual tuning in gradient methods.

Statistics > Machine Learning arXiv:2602.13112 (stat) [Submitted on 13 Feb 2026] Title:AdaGrad-Diff: A New Version of the Adaptive Gradient Algorithm Authors:Matia Bojovic, Saverio Salzo, Massimiliano Pontil View a PDF of the paper titled AdaGrad-Diff: A New Version of the Adaptive Gradient Algorithm, by Matia Bojovic and 2 other authors View PDF HTML (experimental) Abstract:Vanilla gradient methods are often highly sensitive to the choice of stepsize, which typically requires manual tuning. Adaptive methods alleviate this issue and have therefore become widely used. Among them, AdaGrad has been particularly influential. In this paper, we propose an AdaGrad-style adaptive method in which the adaptation is driven by the cumulative squared norms of successive gradient differences rather than gradient norms themselves. The key idea is that when gradients vary little across iterations, the stepsize is not unnecessarily reduced, while significant gradient fluctuations, reflecting curvature or instability, lead to automatic stepsize damping. Numerical experiments demonstrate that the proposed method is more robust than AdaGrad in several practically relevant settings. Comments: Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC) Cite as: arXiv:2602.13112 [stat.ML] (or arXiv:2602.13112v1 [stat.ML] for this version) https://doi.org/10.48550/arXiv.2602.13112 Focus to learn more arXiv-issued DOI via DataCite (pending registration) S...

Read Original Article

[2602.13112] AdaGrad-Diff: A New Version of the Adaptive Gradient Algorithm

Summary

Why It Matters

Key Takeaways

Related Articles

Artificial intelligence - Machine Learning, Robotics, Algorithms

Fed Chair Jerome Powell, Treasury's Bessent and top bank CEOs met over Anthropic's Mythos model

CoreWeave strikes a deal to power Anthropic's Claude AI models — and the stock surges 12%

New AI model sparks alarm as governments brace for AI-driven cyberattacks

No comments

Stay updated with AI News