[2602.15322] On Surprising Effectiveness of Masking Updates in Adaptive Optimizers

[2602.15322] On Surprising Effectiveness of Masking Updates in Adaptive Optimizers

arXiv - AI 3 min read Article

Summary

This paper explores the effectiveness of randomly masking updates in adaptive optimizers for training large language models, introducing a new method called Momentum-aligned gradient masking (Magma) that shows significant performance improvements.

Why It Matters

As machine learning models, particularly large language models, become increasingly complex, optimizing their training processes is crucial. This research challenges traditional adaptive optimizers by demonstrating that masking updates can enhance performance, potentially leading to more efficient training methodologies.

Key Takeaways

  • Randomly masking parameter updates can outperform traditional adaptive optimizers.
  • The new method, Magma, provides consistent performance gains with minimal computational overhead.
  • Magma reduces perplexity significantly compared to established optimizers like Adam.
  • The study introduces a curvature-dependent geometric regularization that smooths optimization trajectories.
  • This research could influence future approaches to training large language models.

Computer Science > Machine Learning arXiv:2602.15322 (cs) [Submitted on 17 Feb 2026] Title:On Surprising Effectiveness of Masking Updates in Adaptive Optimizers Authors:Taejong Joo, Wenhan Xia, Cheolmin Kim, Ming Zhang, Eugene Ie View a PDF of the paper titled On Surprising Effectiveness of Masking Updates in Adaptive Optimizers, by Taejong Joo and 4 other authors View PDF HTML (experimental) Abstract:Training large language models (LLMs) relies almost exclusively on dense adaptive optimizers with increasingly sophisticated preconditioners. We challenge this by showing that randomly masking parameter updates can be highly effective, with a masked variant of RMSProp consistently outperforming recent state-of-the-art optimizers. Our analysis reveals that the random masking induces a curvature-dependent geometric regularization that smooths the optimization trajectory. Motivated by this finding, we introduce Momentum-aligned gradient masking (Magma), which modulates the masked updates using momentum-gradient alignment. Extensive LLM pre-training experiments show that Magma is a simple drop-in replacement for adaptive optimizers with consistent gains and negligible computational overhead. Notably, for the 1B model size, Magma reduces perplexity by over 19\% and 9\% compared to Adam and Muon, respectively. Comments: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.15322 [cs.LG]   (or arXiv:2602.15322v1 [cs.LG] for this version)   https://d...

Related Articles

Llms

I compiled every major AI agent security incident from 2024-2026 in one place - 90 incidents, all sourced, updated weekly

After tracking AI agent security incidents for the past year, I put together a single reference covering every major breach, vulnerabilit...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] Forced Depth Consideration Reduces Type II Errors in LLM Self-Classification: Evidence from an Exploration Prompting Ablation Study - (200 trap prompts, 4 models, 8 Step-0 variants) [R]

LLM-Based task classifier tend to misroute prompts that look simple at first glance, but require deeper understanding - I call it "Type I...

Reddit - Machine Learning · 1 min ·
Llms

I asked ChatGPT and Gemini to generate a world map

submitted by /u/Pitiful-Entrance5769 [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

Cant wait to use Mythos model - Anthropic refuses to release Claude Mythos publicly — model found thousands of zero-days across every major OS and browser. Launches Project Glasswing with Apple, Microsoft, Google, and others for defensive use.

Anthropic announced Project Glasswing, a defensive cybersecurity initiative with Apple, Microsoft, Google, AWS, NVIDIA, CrowdStrike, and ...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime