[2602.17423] Convergence Analysis of Two-Layer Neural Networks under Gaussian Input Masking
Summary
This paper explores the convergence of two-layer neural networks trained with Gaussian masked inputs, demonstrating linear convergence through Neural Tangent Kernel analysis.
Why It Matters
Understanding the convergence behavior of neural networks under noisy conditions is crucial for applications in sensor networks, privacy-preserving training, and federated learning. This research provides theoretical insights that can enhance model robustness in real-world scenarios.
Key Takeaways
- The paper analyzes two-layer neural networks with Gaussian input masking.
- It establishes linear convergence guarantees under specific conditions.
- The findings are relevant for applications in federated learning and sensor networks.
- The research resolves key issues related to randomness in non-linear activations.
- Utilizing Neural Tangent Kernel analysis provides a robust theoretical framework.
Computer Science > Machine Learning arXiv:2602.17423 (cs) [Submitted on 19 Feb 2026] Title:Convergence Analysis of Two-Layer Neural Networks under Gaussian Input Masking Authors:Afroditi Kolomvaki, Fangshuo Liao, Evan Dramko, Ziyun Guang, Anastasios Kyrillidis View a PDF of the paper titled Convergence Analysis of Two-Layer Neural Networks under Gaussian Input Masking, by Afroditi Kolomvaki and 4 other authors View PDF HTML (experimental) Abstract:We investigate the convergence guarantee of two-layer neural network training with Gaussian randomly masked inputs. This scenario corresponds to Gaussian dropout at the input level, or noisy input training common in sensor networks, privacy-preserving training, and federated learning, where each user may have access to partial or corrupted features. Using a Neural Tangent Kernel (NTK) analysis, we demonstrate that training a two-layer ReLU network with Gaussian randomly masked inputs achieves linear convergence up to an error region proportional to the mask's variance. A key technical contribution is resolving the randomness within the non-linear activation, a problem of independent interest. Comments: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC) Cite as: arXiv:2602.17423 [cs.LG] (or arXiv:2602.17423v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.17423 Focus to learn more arXiv-issued DOI via DataCite (pending r...