Hugging Face Text Generation Inference available for AWS Inferentia2
About this article
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Back to Articles Hugging Face Text Generation Inference available for AWS Inferentia2 Published February 1, 2024 Update on GitHub Upvote 5 Philipp Schmid philschmid Follow David Corvoysier dacorvo Follow We are excited to announce the general availability of Hugging Face Text Generation Inference (TGI) on AWS Inferentia2 and Amazon SageMaker. Text Generation Inference (TGI), is a purpose-built solution for deploying and serving Large Language Models (LLMs) for production workloads at scale. TGI enables high-performance text generation using Tensor Parallelism and continuous batching for the most popular open LLMs, including Llama, Mistral, and more. Text Generation Inference is used in production by companies such as Grammarly, Uber, Deutsche Telekom, and many more. The integration of TGI into Amazon SageMaker, in combination with AWS Inferentia2, presents a powerful solution and viable alternative to GPUs for building production LLM applications. The seamless integration ensures easy deployment and maintenance of models, making LLMs more accessible and scalable for a wide range of production use cases. With the new TGI for AWS Inferentia2 on Amazon SageMaker, AWS customers can benefit from the same technologies that power highly-concurrent, low-latency LLM experiences like HuggingChat, OpenAssistant, and Serverless Endpoints for LLMs on the Hugging Face Hub. Deploy Zephyr 7B on AWS Inferentia2 using Amazon SageMaker This tutorial shows how easy it is to deploy a state-of-...