[2602.13498] TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers

[2602.13498] TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers

arXiv - AI 3 min read Article

Summary

TrasMuon introduces a novel optimization technique that enhances the stability and efficiency of orthogonalized momentum optimizers, outperforming traditional methods in empirical tests.

Why It Matters

This research addresses critical challenges in machine learning optimization, particularly the sensitivity of existing methods to hyperparameters and high-energy bursts. By improving optimization stability and convergence rates, TrasMuon could significantly enhance model training processes across various applications in AI.

Key Takeaways

  • TrasMuon stabilizes optimization by preserving near-isometric geometry while adapting magnitudes.
  • The method incorporates global RMS calibration and energy-based trust-region clipping to enhance stability.
  • Empirical results show TrasMuon converges faster than traditional baselines in vision and language models.
  • The approach mitigates issues related to high-energy outliers that can destabilize training.
  • TrasMuon demonstrates superior robustness without requiring warmup stages.

Computer Science > Machine Learning arXiv:2602.13498 (cs) [Submitted on 13 Feb 2026] Title:TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers Authors:Peng Cheng, Jiucheng Zang, Qingnan Li, Liheng Ma, Yufei Cui, Yingxue Zhang, Boxing Chen, Ming Jian, Wen Tong View a PDF of the paper titled TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers, by Peng Cheng and 8 other authors View PDF HTML (experimental) Abstract:Muon-style optimizers leverage Newton-Schulz (NS) iterations to orthogonalize updates, yielding update geometries that often outperform Adam-series methods. However, this orthogonalization discards magnitude information, rendering training sensitive to step-size hyperparameters and vulnerable to high-energy bursts. To mitigate this, we introduce TrasMuon (\textbf{T}rust \textbf{R}egion \textbf{A}daptive \textbf{S}caling \textbf{Muon}). TrasMuon preserves the near-isometric geometry of Muon while stabilizing magnitudes through (i) global RMS calibration and (ii) energy-based trust-region clipping. We demonstrate that while reintroducing adaptive scaling improves optimization efficiency, it typically exacerbates instability due to high-energy outliers. TrasMuon addresses this by defining a trust region based on relative energy ratios, confining updates to a stable zone. Empirical experiments on vision and language models demonstrate that TrasMuon converges faster than baselines. Furthermore, experiments without wa...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Llms

[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry

Hi r/MachineLearning, I’m looking for an arXiv endorser in cs.LG for a paper on inference-time distribution shift detection for deployed ...

Reddit - Machine Learning · 1 min ·
Top 10 AI certifications and courses for 2026
Ai Startups

Top 10 AI certifications and courses for 2026

This article reviews the top 10 AI certifications and courses for 2026, highlighting their significance in a rapidly evolving field and t...

AI Events · 15 min ·
Machine Learning

[P] MCGrad: fix calibration of your ML model in subgroups

Hi r/MachineLearning, We’re open-sourcing MCGrad, a Python package for multicalibration–developed and deployed in production at Meta. Thi...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime