[2503.12988] ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM

[2503.12988] ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2503.12988: ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM

Computer Science > Hardware Architecture arXiv:2503.12988 (cs) [Submitted on 17 Mar 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM Authors:Wenqiang Wang, Yijia Zhang, Zikai Zhang, Guanting Huo, Hao Liang, Shijie Cao, Ningyi Xu View a PDF of the paper titled ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM, by Wenqiang Wang and 6 other authors View PDF HTML (experimental) Abstract:As large language models (LLMs) demonstrate powerful capabilities, deploying them on edge devices has become increasingly crucial, offering advantages in privacy and real-time interaction. QLoRA has emerged as the standard approach for on-device LLMs, leveraging quantized models to reduce memory and computational costs while utilizing LoRA for task-specific adaptability. In this work, we propose ROMA, a QLoRA accelerator with a hybrid storage architecture that uses ROM for quantized base models and SRAM for LoRA weights and KV cache. Our insight is that the quantized base model is stable and converged, making it well-suited for ROM storage. Meanwhile, LoRA modules offer the flexibility to adapt to new data without requiring updates to the base model. To further reduce the area cost of ROM, we introduce a novel B-ROM design and integrate it with the compute unit to form a fused cell for efficient use of chip resources. ROMA can effectively store both a 4-bit 3B and a 2-bit 8B LLaMA model e...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

What is AI, how do apps like ChatGPT work and why are there concerns?
Llms

What is AI, how do apps like ChatGPT work and why are there concerns?

AI is transforming modern life, but some critics worry about its potential misuse and environmental impact.

AI News - General · 7 min ·
[2603.29957] Think Anywhere in Code Generation
Llms

[2603.29957] Think Anywhere in Code Generation

Abstract page for arXiv paper 2603.29957: Think Anywhere in Code Generation

arXiv - Machine Learning · 3 min ·
[2603.16880] NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectro-Spatial Grounding and Temporal State-Space Reasoning
Llms

[2603.16880] NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectro-Spatial Grounding and Temporal State-Space Reasoning

Abstract page for arXiv paper 2603.16880: NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectr...

arXiv - Machine Learning · 4 min ·
[2512.21106] Semantic Refinement with LLMs for Graph Representations
Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime