[2510.21314] A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization
About this article
Abstract page for arXiv paper 2510.21314: A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization
Computer Science > Machine Learning arXiv:2510.21314 (cs) [Submitted on 24 Oct 2025 (v1), last revised 1 Mar 2026 (this version, v2)] Title:A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization Authors:Xuan Tang, Jichu Li, Difan Zou View a PDF of the paper titled A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization, by Xuan Tang and 2 other authors View PDF Abstract:The rapid scaling of large language models (LLMs) has made low-precision training essential for reducing memory, improving efficiency, and enabling larger models and datasets. Existing convergence theories for adaptive optimizers, however, assume all components are exact and neglect hardware-aware quantization, leaving open the question of why low-precision training remains effective. We introduce the first theoretical framework for analyzing the convergence of adaptive optimizers, including Adam and Muon, under floating-point quantization of gradients, weights, and optimizer states (e.g., moment estimates). Within this framework, we derive convergence rates on smooth non-convex objectives under standard stochastic gradient assumptions, explicitly characterizing how quantization errors from different components affect convergence. We show that both algorithms retain rates close to their full-precision counterparts provided mantissa length scales only logarithmically with the number of iterations. Our analysis further reveals that Adam is highly sensitive to w...