Text-Generation Pipeline on Intel® Gaudi® 2 AI Accelerator

Hugging Face Blog February 15, 2026 7 min read

About this article

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Back to Articles Text-Generation Pipeline on Intel® Gaudi® 2 AI Accelerator Published February 29, 2024 Update on GitHub Upvote 1 Siddhant Jagtap siddjags Follow guest With the Generative AI (GenAI) revolution in full swing, text-generation with open-source transformer models like Llama 2 has become the talk of the town. AI enthusiasts as well as developers are looking to leverage the generative abilities of such models for their own use cases and applications. This article shows how easy it is to generate text with the Llama 2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class – you'll be able to run the models with just a few lines of code! This custom pipeline class has been designed to offer great flexibility and ease of use. Moreover, it provides a high level of abstraction and performs end-to-end text-generation which involves pre-processing and post-processing. There are multiple ways to use the pipeline - you can run the run_pipeline.py script from the Optimum Habana repository, add the pipeline class to your own python scripts, or initialize LangChain classes with it. Prerequisites Since the Llama 2 models are part of a gated repo, you need to request access if you haven't done it already. First, you have to visit the Meta website and accept the terms and conditions. After you are granted access by Meta (it can take a day or two), you have to request access in Hugging Face, using the same email address you provided in the Meta form...

Originally published on February 15, 2026. Curated by AI News.

Open Source Ai

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

A Blog post by IBM Granite on Hugging Face

Hugging Face Blog · 7 min · about 7 hours ago

Llms

My AI spent last night modifying its own codebase

I've been working on a local AI system called Apis that runs completely offline through Ollama. During a background run, Apis identified ...

Reddit - Artificial Intelligence · 1 min · about 12 hours ago

Llms

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

TL;DR: Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — a...

Reddit - Artificial Intelligence · 1 min · about 14 hours ago

Llms

[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence

Abstract page for arXiv paper 2603.16430: EngGPT2: Sovereign, Efficient and Open Intelligence

arXiv - AI · 4 min · about 15 hours ago

Text-Generation Pipeline on Intel® Gaudi® 2 AI Accelerator

About this article

Related Articles

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

My AI spent last night modifying its own codebase

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence

No comments

Stay updated with AI News