[2509.18129] Pareto-optimal Trade-offs Between Communication and Computation with Flexible Gradient Tracking
Summary
This paper presents FlexGT, a method for optimizing distributed stochastic problems by balancing communication and computation, achieving Pareto-optimal trade-offs.
Why It Matters
The research addresses critical challenges in distributed optimization, particularly in non-i.i.d. data scenarios. By proposing a flexible gradient tracking method, it enhances computational efficiency and communication strategies, which are vital for improving machine learning models in resource-constrained environments.
Key Takeaways
- FlexGT enables tunable local updates and neighbor communications for efficient distributed optimization.
- The proposed method achieves optimal iteration and communication complexities, improving existing benchmarks.
- Numerical experiments validate the theoretical findings, demonstrating practical applicability.
Mathematics > Optimization and Control arXiv:2509.18129 (math) [Submitted on 11 Sep 2025 (v1), last revised 15 Feb 2026 (this version, v2)] Title:Pareto-optimal Trade-offs Between Communication and Computation with Flexible Gradient Tracking Authors:Yan Huang, Jinming Xu, Li Chai, Jiming Chen, Karl H. Johansson View a PDF of the paper titled Pareto-optimal Trade-offs Between Communication and Computation with Flexible Gradient Tracking, by Yan Huang and 4 other authors View PDF HTML (experimental) Abstract:This paper addresses distributed stochastic optimization problems under non-i.i.d. data, focusing on the inherent trade-offs between communication and computational efficiency. To this end, we propose FlexGT, a flexible snapshot gradient tracking method that enables tunable numbers of local updates and neighbor communications per round, thereby adapting efficiently to diverse system resource conditions. Leveraging a unified convergence analysis framework, we derive tight communication and computational complexity for FlexGT with explicit dependence on objective properties and certain tunable parameters. Moreover, we introduce an accelerated variant, termed Acc-FlexGT, and prove that, with prior knowledge of the graph, it achieves Pareto-optimal trade-offs between communication and computation. Particularly, in the nonconvex case, Acc-FlexGT achieves the optimal iteration complexity of $\tilde{\mathcal{O}}\left( \left( L\sigma ^2 \right) /\left( n\epsilon ^2 \right) +L/\l...