[2511.06174] LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs
About this article
Abstract page for arXiv paper 2511.06174: LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs
Computer Science > Hardware Architecture arXiv:2511.06174 (cs) [Submitted on 9 Nov 2025 (v1), last revised 22 Mar 2026 (this version, v2)] Title:LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs Authors:Zifan He, Shengyu Ye, Rui Ma, Yang Wang, Jason Cong View a PDF of the paper titled LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs, by Zifan He and 4 other authors View PDF HTML (experimental) Abstract:The rapid development of large language models (LLM) has greatly enhanced everyday applications. While many FPGA-based accelerators, with flexibility for fine-grained data control, exhibit superior speed and energy efficiency compared to GPUs, recent GPU-specific optimizations have diminished this advantage. When limited to arithmetic-based computation, FPGAs often underperform GPUs due to their comparatively fewer computational resources. To address this challenge, we exploit a key advantage of FPGAs over GPUs: abundant distributed on-chip memory embedded among computational units. We believe that shifting LLM inference from arithmetic-based to memory-based computations through table lookups can improve the efficiency on FPGAs to compete with GPUs. However, existing methods are inefficient or unable to scale and deploy language models due to algorithm and architecture design limitations. This paper introduces \textbf{LUT-LLM}, the first FPGA accelerator that deploy 1B+ language model with memor...