[2411.07102] Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems

[2411.07102] Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems

arXiv - Machine Learning 4 min read Article

Summary

This paper presents a novel algorithmic framework that integrates momentum terms with stochastic line search methods to optimize finite-sum problems, particularly in large-scale deep learning contexts.

Why It Matters

The research addresses the challenges of efficiently optimizing finite-sum problems, which are prevalent in machine learning. By combining momentum techniques with stochastic line search, the proposed method demonstrates improved convergence and performance, making it relevant for practitioners in AI and optimization.

Key Takeaways

  • Introduces a framework that combines momentum terms with stochastic line searches.
  • Demonstrates state-of-the-art performance in both convex and nonconvex optimization problems.
  • Highlights the importance of mini-batch persistency in improving computational efficiency.

Mathematics > Optimization and Control arXiv:2411.07102 (math) [Submitted on 11 Nov 2024 (v1), last revised 23 Feb 2026 (this version, v4)] Title:Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems Authors:Matteo Lapucci, Davide Pucci View a PDF of the paper titled Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems, by Matteo Lapucci and Davide Pucci View PDF HTML (experimental) Abstract:In this work, we address unconstrained finite-sum optimization problems, with particular focus on instances originating in large scale deep learning scenarios. Our main interest lies in the exploration of the relationship between recent line search approaches for stochastic optimization in the overparametrized regime and momentum directions. First, we point out that combining these two elements with computational benefits is not straightforward. To this aim, we propose a solution based on mini-batch persistency. We then introduce an algorithmic framework that exploits a mix of data persistency, conjugate-gradient type rules for the definition of the momentum parameter and stochastic line searches. The resulting algorithm provably possesses convergence properties under suitable assumptions and is empirically shown to outperform other popular methods from the literature, obtaining state-of-the-art results in both convex and nonconvex large scale training pr...

Related Articles

Machine Learning

[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)

We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video i...

Reddit - Machine Learning · 1 min ·
Machine Learning

FLUX 2 Pro (2026) Sketch to Image

I sketched a cow and tested how different models interpret it into a realistic image for downstream 3D generation, turns out some models ...

Reddit - Artificial Intelligence · 1 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
Machine Learning

[D] TMLR reviews seem more reliable than ICML/NeurIPS/ICLR

This year I submitted a paper to ICML for the first time. I have also experienced the review process at TMLR and ICLR. From my observatio...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime