[2603.02417] Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits
About this article
Abstract page for arXiv paper 2603.02417: Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits
Statistics > Machine Learning arXiv:2603.02417 (stat) [Submitted on 2 Mar 2026] Title:Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits Authors:Daniel Zantedeschi, Kumar Muthuraman View a PDF of the paper titled Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits, by Daniel Zantedeschi and Kumar Muthuraman View PDF HTML (experimental) Abstract:We develop a Fisher-geometric theory of stochastic gradient descent (SGD) in which mini-batch noise is an intrinsic, loss-induced matrix -- not an exogenous scalar variance. Under exchangeable sampling, the mini-batch gradient covariance is pinned down (to leading order) by the projected covariance of per-sample gradients: it equals projected Fisher information for well-specified likelihood losses and the projected Godambe (sandwich) matrix for general M-estimation losses. This identification forces a diffusion approximation with Fisher/Godambe-structured volatility (effective temperature tau = eta/b) and yields an Ornstein-Uhlenbeck linearization whose stationary covariance is given in closed form by a Fisher-Lyapunov equation. Building on this geometry, we prove matching minimax upper and lower bounds of order Theta(1/N) for Fisher/Godambe risk under a total oracle budget N; the lower bound holds under a martingale oracle condition (bounded predictable quadratic variation), strictly s...