[D] Howcome Muon is only being used for Transformers?
About this article
Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets turn up basically no results, despite its announcement including a new training speed record for Cifar-10. In my experience faster training usually comes with better final models, so what's the deal? Does it not actually scale? Have I missed papers? submitted by /u/lukeiy [link] [comments]
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket