[2406.00300] Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework
About this article
Abstract page for arXiv paper 2406.00300: Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework
Computer Science > Machine Learning arXiv:2406.00300 (cs) [Submitted on 1 Jun 2024 (v1), last revised 24 Mar 2026 (this version, v3)] Title:Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework Authors:Parsa Moradi, Behrooz Tahmasebi, Mohammad Ali Maddah-Ali View a PDF of the paper titled Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework, by Parsa Moradi and 2 other authors View PDF Abstract:Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a combination of the data, rather than the raw data itself. The final result then is decoded from the collective outputs of the worker nodes. However, there is a significant gap between current coded computing approaches and the broader landscape of general distributed computing, particularly when it comes to machine learning workloads. To bridge this gap, we propose a novel foundation for coded computing, integrating the principles of learning theory, and developing a framework that seamlessly adapts with machine learning applications. In this framework, the objective is to find the encoder and decoder functions that minimize the loss function, defined as the mean squared error between the estimated and true values. Facilitating the search for the optimum decoding and functions, we show tha...