[D] The Bitter Lesson of Optimization: Why training Neural Networks to update themselves is mathematically brutal (but probably inevitable)
About this article
Are we still stuck in the "feature engineering" era of optimization? We trust neural networks to learn unimaginably complex patterns from data, yet the algorithms we use to train them (like Adam or AdamW) are entirely hand-designed by humans. Richard Sutton's famous "Bitter Lesson" dictates that hand-crafted heuristics ultimately lose to general methods that leverage learning. So, why aren't we all using torch.optim.NeuralNetOptimizer to train our LLMs today? https://preview.redd.it/k17ltm9dt...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket