Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and

Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive

Hugging Face Blog February 15, 2026 9 min read

About this article

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Back to Articles Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive Published January 15, 2024 Update on GitHub Upvote 7 +1 Sophie Schoenmeyer sschoenmeyer Follow guest Tianlei Wu tlwu Follow guest Morgan Funtowicz mfuntowicz Follow Introduction SD Turbo and SDXL Turbo are two fast generative text-to-image models capable of generating viable images in as little as one step, a significant improvement over the 30+ steps often required with previous Stable Diffusion models. SD Turbo is a distilled version of Stable Diffusion 2.1, and SDXL Turbo is a distilled version of SDXL 1.0. We’ve previously shown how to accelerate Stable Diffusion inference with ONNX Runtime. Not only does ONNX Runtime provide performance benefits when used with SD Turbo and SDXL Turbo, but it also makes the models accessible in languages other than Python, like C# and Java. Performance gains In this post, we will introduce optimizations in the ONNX Runtime CUDA and TensorRT execution providers that speed up inference of SD Turbo and SDXL Turbo on NVIDIA GPUs significantly. ONNX Runtime outperformed PyTorch for all (batch size, number of steps) combinations tested, with throughput gains as high as 229% for the SDXL Turbo model and 120% for the SD Turbo model. ONNX Runtime CUDA has particularly good performance for dynamic shape but demonstrates a marked improvement over PyTorch for static shape as well. How to run SD Turbo and SDXL Turbo To accelerate inference with the ONNX Runt...

Originally published on February 15, 2026. Curated by AI News.

Open Source Ai

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

A Blog post by IBM Granite on Hugging Face

Hugging Face Blog · 7 min · about 6 hours ago

Llms

My AI spent last night modifying its own codebase

I've been working on a local AI system called Apis that runs completely offline through Ollama. During a background run, Apis identified ...

Reddit - Artificial Intelligence · 1 min · about 11 hours ago

Llms

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

TL;DR: Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — a...

Reddit - Artificial Intelligence · 1 min · about 13 hours ago

Llms

[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence

Abstract page for arXiv paper 2603.16430: EngGPT2: Sovereign, Efficient and Open Intelligence

arXiv - AI · 4 min · about 14 hours ago

Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive

About this article

Related Articles

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

My AI spent last night modifying its own codebase

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence

No comments

Stay updated with AI News