Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive
About this article
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Back to Articles Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive Published January 15, 2024 Update on GitHub Upvote 7 +1 Sophie Schoenmeyer sschoenmeyer Follow guest Tianlei Wu tlwu Follow guest Morgan Funtowicz mfuntowicz Follow Introduction SD Turbo and SDXL Turbo are two fast generative text-to-image models capable of generating viable images in as little as one step, a significant improvement over the 30+ steps often required with previous Stable Diffusion models. SD Turbo is a distilled version of Stable Diffusion 2.1, and SDXL Turbo is a distilled version of SDXL 1.0. We’ve previously shown how to accelerate Stable Diffusion inference with ONNX Runtime. Not only does ONNX Runtime provide performance benefits when used with SD Turbo and SDXL Turbo, but it also makes the models accessible in languages other than Python, like C# and Java. Performance gains In this post, we will introduce optimizations in the ONNX Runtime CUDA and TensorRT execution providers that speed up inference of SD Turbo and SDXL Turbo on NVIDIA GPUs significantly. ONNX Runtime outperformed PyTorch for all (batch size, number of steps) combinations tested, with throughput gains as high as 229% for the SDXL Turbo model and 120% for the SD Turbo model. ONNX Runtime CUDA has particularly good performance for dynamic shape but demonstrates a marked improvement over PyTorch for static shape as well. How to run SD Turbo and SDXL Turbo To accelerate inference with the ONNX Runt...