[2512.06443] Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices

[2512.06443] Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2512.06443: Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices

Computer Science > Distributed, Parallel, and Cluster Computing arXiv:2512.06443 (cs) [Submitted on 6 Dec 2025 (v1), last revised 14 Apr 2026 (this version, v2)] Title:Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices Authors:Xiangyu Li, Chengyu Yin, Weijun Wang, Jianyu Wei, Ting Cao, Yunxin Liu View a PDF of the paper titled Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices, by Xiangyu Li and 5 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) are increasingly deployed on edge devices. To meet strict resource constraints, real-world deployment has pushed LLM quantization from 8-bit to 4-bit, 2-bit, and now 1.58-bit. Combined with lookup table (LUT)-based inference, CPUs run these ultra-low-bit LLMs even faster than NPUs, opening new opportunities for ubiquitous on-device intelligence. However, this paper identifies that LUT-based inference underutilizes memory bandwidth during parallel inference, which is required for prefilling, test-time scaling, and other multi-token scenarios. The root cause is the scalar LUT paradigm, which performs repetitive and non-contiguous memory accesses for each token. To solve the issue, we propose vector LUT, a new lookup paradigm that constructs a unified LUT across parallel tokens, and performs a single $1 \rightarrow N$ lookup per index. To realize it efficiently, we further introduce (1) Vector LUT-Centric Tensor Layout, and (2) Cache-...

Originally published on April 15, 2026. Curated by AI News.

Related Articles

I replaced ChatGPT with Google's offline AI on my phone for 24 hours — here's my verdict
Llms

I replaced ChatGPT with Google's offline AI on my phone for 24 hours — here's my verdict

Can AI finally stay on your phone? I tested Google’s offline AI app for 24 hours — and it completely changed how I think about privacy, e...

AI Tools & Products · 9 min ·
OpenAI Launches GPT-5.4-Cyber with Expanded Access for Security Teams
Llms

OpenAI Launches GPT-5.4-Cyber with Expanded Access for Security Teams

GPT-5.4-Cyber launch expands defender access and helped fix 3,000+ vulnerabilities, strengthening proactive cybersecurity defenses.

AI Tools & Products · 5 min ·
Llms

Anthropic Audaciously Hires A Psychiatrist To Psychologically Assess Claude Mythos AI

Anthropic has hired a psychiatrist to conduct psychological assessments of its Claude Mythos AI. Further context is not provided.

AI Tools & Products · 1 min ·
Llms

OpenAI expands its cyber defense program with GPT-5.4-Cyber for vetted researchers

The company is scaling its Trusted Access for Cyber (TAC) program to thousands of verified individual defenders and hundreds of teams res...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime