Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding
About this article
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Back to Articles Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding Published January 30, 2024 Update on GitHub Upvote 9 +3 Ofir Zafrir ofirzaf Follow guest Ella Charlaix echarlaix Follow Igor Margulis imargulis Follow guest Daniel Korat danielkorat Follow guest Jonathan Mamou jmamou Follow guest Guy Boudoukh guybd Follow guest Oren Pereg orenpereg Follow guest Moshe Wasserblat moshew Follow guest Haihao Shen Haihao Follow guest Ahmad Yasin aayasin Follow guest FanZhao FanZhao Follow guest Introduction Recently, code generation models have become very popular, especially with the release of state-of-the-art open-source models such as BigCode’s StarCoder and Meta AI’s Code Llama. A growing number of works focuses on making Large Language Models (LLMs) more optimized and accessible. In this blog, we are happy to share the latest results of LLM optimization on Intel Xeon focusing on the popular code generation LLM, StarCoder. The StarCoder Model is a cutting-edge LLM specifically designed for assisting the user with various coding tasks such as code completion, bug fixing, code summarization, and even generating code snippets from natural language descriptions. The StarCoder model is a member of the StarCoder family which includes the StarCoderBase variant as well. These Large Language Models for Code (Code LLMs) are trained on permissively licensed data from GitHub, including over 80 programming languages, Git commits, GitHub issues, and Jupyte...