[R] Architecture Determines Optimization: Deriving Weight Updates from Network Topology (seeking arXiv endorsement - cs.LG)
About this article
Abstract: We derive neural network weight updates from first principles without assuming gradient descent or a specific loss function. Starting from the error equation E = y − f(x) and linearizing with respect to the free parameters, while noting that the data x and target y are fixed observations, we obtain a linear constraint on the weight perturbations. The minimum-norm solution to this underdetermined system is algebraically identical to the standard weight gradient. Gradient descent is t...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket