[2509.18001] Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
About this article
Abstract page for arXiv paper 2509.18001: Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
Computer Science > Machine Learning arXiv:2509.18001 (cs) [Submitted on 22 Sep 2025 (v1), last revised 2 Apr 2026 (this version, v5)] Title:Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise Authors:Haocheng Luo, Mehrtash Harandi, Dinh Phung, Trung Le View a PDF of the paper titled Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise, by Haocheng Luo and 3 other authors View PDF HTML (experimental) Abstract:Sharpness-aware minimization (SAM) has emerged as a highly effective technique to improve model generalization, but its underlying principles are not fully understood. We investigate m-sharpness, where SAM performance improves monotonically as the micro-batch size for computing perturbations decreases, a phenomenon critical for distributed training yet lacking rigorous explanation. We leverage an extended Stochastic Differential Equation (SDE) framework and analyze stochastic gradient noise (SGN) to characterize the dynamics of SAM variants, including n-SAM and m-SAM. Our analysis reveals that stochastic perturbations induce an implicit variance-based sharpness regularization whose strength increases as m decreases. Motivated by this insight, we propose Reweighted SAM (RW-SAM), which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable. Comprehensive experiments validate our theory and this http URL is available at this https URL. Comments: Subjects: Machine Learning (c...