Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face
About this article
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Back to Articles Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face Published October 16, 2025 Update on GitHub Upvote 18 +12 Jiqing.Feng Jiqing Follow Intel Matrix Yao MatrixYao Follow Intel Ke Ding kding1 Follow Intel Ilyas Moutawwakil IlyasMoutawwakil Follow Intel and Hugging Face collaborated to demonstrate the real-world value of upgrading to Google’s latest C4 Virtual Machine (VM) running on Intel® Xeon® 6 processors (codenamed Granite Rapids (GNR)). We specifically wanted to benchmark improvements in the text generation performance of OpenAI GPT OSS Large Language Model(LLM). The results are in, and they are impressive, demonstrating a 1.7x improvement in Total Cost of Ownership(TCO) over the previous-generation Google C3 VM instances. The Google Cloud C4 VM instance further resulted in: 1.4x to 1.7x TPOT throughput/vCPU/dollar Lower price per hour over C3 VM Introduction GPT OSS is a common name for an open-source Mixture of Experts (MoE) model released by OpenAI. An MoE model is a deep neural network architecture that uses specialized “expert” sub-networks and a “gating network” to decide which experts to use for a given input. MoE models allow you to scale your model capacity efficiently without linearly scaling compute costs. They also allow for specialization, where different “experts” learn different skills, allowing them to adapt to diverse data distributions. Even with very large parameters, only a small subset of experts is ...