[2511.06174] LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

[2511.06174] LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2511.06174: LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

Computer Science > Hardware Architecture arXiv:2511.06174 (cs) [Submitted on 9 Nov 2025 (v1), last revised 22 Mar 2026 (this version, v2)] Title:LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs Authors:Zifan He, Shengyu Ye, Rui Ma, Yang Wang, Jason Cong View a PDF of the paper titled LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs, by Zifan He and 4 other authors View PDF HTML (experimental) Abstract:The rapid development of large language models (LLM) has greatly enhanced everyday applications. While many FPGA-based accelerators, with flexibility for fine-grained data control, exhibit superior speed and energy efficiency compared to GPUs, recent GPU-specific optimizations have diminished this advantage. When limited to arithmetic-based computation, FPGAs often underperform GPUs due to their comparatively fewer computational resources. To address this challenge, we exploit a key advantage of FPGAs over GPUs: abundant distributed on-chip memory embedded among computational units. We believe that shifting LLM inference from arithmetic-based to memory-based computations through table lookups can improve the efficiency on FPGAs to compete with GPUs. However, existing methods are inefficient or unable to scale and deploy language models due to algorithm and architecture design limitations. This paper introduces \textbf{LUT-LLM}, the first FPGA accelerator that deploy 1B+ language model with memor...

Originally published on March 24, 2026. Curated by AI News.

Related Articles

Bluesky’s new app is an AI for customizing your feed | The Verge
Llms

Bluesky’s new app is an AI for customizing your feed | The Verge

Eventually Attie will be able to vibe code entire apps for the AT Protocol.

The Verge - AI · 3 min ·
Llms

Nicolas Carlini (67.2k citations on Google Scholar) says Claude is a better security researcher than him, made $3.7 million from exploiting smart contracts, and found vulnerabilities in Linux and Ghost

Link: https://m.youtube.com/watch?v=1sd26pWhfmg The Linux exploit is especially interesting because it was introduced in 2003 and was nev...

Reddit - Artificial Intelligence · 1 min ·
Llms

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

Inspired by Andrej Karpathy's AutoResearch, I built a system where Claude Code acts as an autonomous ML researcher on tabular binary clas...

Reddit - Machine Learning · 1 min ·
Llms

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

BraiNN An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning BraiNN is a compact research‑...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime