Excellent discussion about LLM scaling [D]
About this article
I came across an excellent in depth discussion of memory and compute scaling analysis for LLMs. One takeaway is that running LLMs locally or on private cloud is wasteful. Memory / compute scaling makes large batching during inference very efficient. Highly recommend. How GPT, Claude, and Gemini are actually trained and served with Reiner Pope submitted by /u/geneing [link] [comments]
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket