Machine Learning Nlp Ai Startups Ai Infrastructure

[2602.13498] TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers

arXiv - AI February 17, 2026 3 min read Article

Summary

TrasMuon introduces a novel optimization technique that enhances the stability and efficiency of orthogonalized momentum optimizers, outperforming traditional methods in empirical tests.

Why It Matters

This research addresses critical challenges in machine learning optimization, particularly the sensitivity of existing methods to hyperparameters and high-energy bursts. By improving optimization stability and convergence rates, TrasMuon could significantly enhance model training processes across various applications in AI.

Key Takeaways

TrasMuon stabilizes optimization by preserving near-isometric geometry while adapting magnitudes.
The method incorporates global RMS calibration and energy-based trust-region clipping to enhance stability.
Empirical results show TrasMuon converges faster than traditional baselines in vision and language models.
The approach mitigates issues related to high-energy outliers that can destabilize training.
TrasMuon demonstrates superior robustness without requiring warmup stages.

Computer Science > Machine Learning arXiv:2602.13498 (cs) [Submitted on 13 Feb 2026] Title:TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers Authors:Peng Cheng, Jiucheng Zang, Qingnan Li, Liheng Ma, Yufei Cui, Yingxue Zhang, Boxing Chen, Ming Jian, Wen Tong View a PDF of the paper titled TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers, by Peng Cheng and 8 other authors View PDF HTML (experimental) Abstract:Muon-style optimizers leverage Newton-Schulz (NS) iterations to orthogonalize updates, yielding update geometries that often outperform Adam-series methods. However, this orthogonalization discards magnitude information, rendering training sensitive to step-size hyperparameters and vulnerable to high-energy bursts. To mitigate this, we introduce TrasMuon (\textbf{T}rust \textbf{R}egion \textbf{A}daptive \textbf{S}caling \textbf{Muon}). TrasMuon preserves the near-isometric geometry of Muon while stabilizing magnitudes through (i) global RMS calibration and (ii) energy-based trust-region clipping. We demonstrate that while reintroducing adaptive scaling improves optimization efficiency, it typically exacerbates instability due to high-energy outliers. TrasMuon addresses this by defining a trust region based on relative energy ratios, confining updates to a stable zone. Empirical experiments on vision and language models demonstrate that TrasMuon converges faster than baselines. Furthermore, experiments without wa...

Read Original Article

[2602.13498] TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers

Summary

Why It Matters

Key Takeaways

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence

[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry

Top 10 AI certifications and courses for 2026

[P] MCGrad: fix calibration of your ML model in subgroups

No comments

Stay updated with AI News