Llms Machine Learning Ai Agents

[2602.07729] Do We Need Adam? Surprisingly Strong and Sparse Reinforcement Learning with SGD in LLMs

arXiv - Machine Learning February 25, 2026 4 min read Article

Summary

This paper explores the effectiveness of the SGD optimizer in reinforcement learning for large language models, challenging the dominance of AdamW and highlighting significant memory efficiency and parameter sparsity.

Why It Matters

The findings of this research are crucial as they question established optimization practices in training large language models, suggesting that traditional methods may not be optimal for reinforcement learning. This could lead to more efficient training techniques, impacting the development of AI systems.

Key Takeaways

SGD outperforms AdamW in reinforcement learning for LLMs.
Reinforcement learning benefits less from adaptive learning rates than previously thought.
SGD achieves high parameter efficiency, updating significantly fewer model parameters.

Computer Science > Machine Learning arXiv:2602.07729 (cs) [Submitted on 7 Feb 2026 (v1), last revised 24 Feb 2026 (this version, v2)] Title:Do We Need Adam? Surprisingly Strong and Sparse Reinforcement Learning with SGD in LLMs Authors:Sagnik Mukherjee, Lifan Yuan, Pavan Jayasinha, Dilek Hakkani-Tür, Hao Peng View a PDF of the paper titled Do We Need Adam? Surprisingly Strong and Sparse Reinforcement Learning with SGD in LLMs, by Sagnik Mukherjee and 4 other authors View PDF HTML (experimental) Abstract:Reinforcement learning (RL), particularly RL from verifiable reward (RLVR), has become a crucial phase of training large language models (LLMs) and a key focus of current scaling efforts. However, optimization practices in RL largely follow those of next-token prediction stages (e.g., pretraining and supervised fine-tuning), despite fundamental differences between RL and these stages highlighted by recent work. One such practice is the use of the AdamW optimizer, which is widely adopted for training large-scale transformers despite its high memory overhead. Our analysis shows that both momentum and adaptive learning rates in AdamW are less influential in RL than in SFT, leading us to hypothesize that RL benefits less from Adam-style per-parameter adaptive learning rates and momentum. Confirming this hypothesis, our experiments demonstrate that the substantially more memory-efficient SGD, which is known to perform poorly in supervised learning of large-scale transformers, ma...

Read Original Article

Llms

A robot car with a Claude AI brain started a YouTube vlog about its own existence

Not a demo reel. Not a tutorial. A robot narrating its own experience — debugging, falling off shelves, questioning its identity. First-p...

Reddit - Artificial Intelligence · 1 min · 22 minutes ago

Llms

Study: LLMs Able to De-Anonymize User Accounts on Reddit, Hacker News & Other "Pseudonymous" Platforms; Report Co-Author Expands, Advises

Advice from the study's co-author: "Be aware that it’s not any single post that identifies you, but the combination of small details acro...

Reddit - Artificial Intelligence · 1 min · 22 minutes ago

Llms

do you guys actually trust AI tools with your data?

idk if it’s just me but lately i’ve been thinking about how casually we use stuff like chatgpt and claude for everything like coding, ran...

Reddit - Artificial Intelligence · 1 min · 22 minutes ago

Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min · about 2 hours ago

[2602.07729] Do We Need Adam? Surprisingly Strong and Sparse Reinforcement Learning with SGD in LLMs

Summary

Why It Matters

Key Takeaways

Related Articles

A robot car with a Claude AI brain started a YouTube vlog about its own existence

Study: LLMs Able to De-Anonymize User Accounts on Reddit, Hacker News & Other "Pseudonymous" Platforms; Report Co-Author Expands, Advises

do you guys actually trust AI tools with your data?

[P] Remote sensing foundation models made easy to use.

No comments

Stay updated with AI News