[2503.12988] ROMA: a Read-Only-Memory-based Accelerator for

[2503.12988] ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM

arXiv - AI March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2503.12988: ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM

Computer Science > Hardware Architecture arXiv:2503.12988 (cs) [Submitted on 17 Mar 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM Authors:Wenqiang Wang, Yijia Zhang, Zikai Zhang, Guanting Huo, Hao Liang, Shijie Cao, Ningyi Xu View a PDF of the paper titled ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM, by Wenqiang Wang and 6 other authors View PDF HTML (experimental) Abstract:As large language models (LLMs) demonstrate powerful capabilities, deploying them on edge devices has become increasingly crucial, offering advantages in privacy and real-time interaction. QLoRA has emerged as the standard approach for on-device LLMs, leveraging quantized models to reduce memory and computational costs while utilizing LoRA for task-specific adaptability. In this work, we propose ROMA, a QLoRA accelerator with a hybrid storage architecture that uses ROM for quantized base models and SRAM for LoRA weights and KV cache. Our insight is that the quantized base model is stable and converged, making it well-suited for ROM storage. Meanwhile, LoRA modules offer the flexibility to adapt to new data without requiring updates to the base model. To further reduce the area cost of ROM, we introduce a novel B-ROM design and integrate it with the compute unit to form a fused cell for efficient use of chip resources. ROMA can effectively store both a 4-bit 3B and a 2-bit 8B LLaMA model e...

Originally published on March 03, 2026. Curated by AI News.

Llms

What is AI, how do apps like ChatGPT work and why are there concerns?

AI is transforming modern life, but some critics worry about its potential misuse and environmental impact.

AI News - General · 7 min · 31 minutes ago

Llms

[2603.29957] Think Anywhere in Code Generation

Abstract page for arXiv paper 2603.29957: Think Anywhere in Code Generation

arXiv - Machine Learning · 3 min · about 3 hours ago

Llms

[2603.16880] NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectro-Spatial Grounding and Temporal State-Space Reasoning

Abstract page for arXiv paper 2603.16880: NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectr...

arXiv - Machine Learning · 4 min · about 3 hours ago

Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min · about 3 hours ago

[2503.12988] ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM

About this article

Related Articles

What is AI, how do apps like ChatGPT work and why are there concerns?

[2603.29957] Think Anywhere in Code Generation

[2603.16880] NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectro-Spatial Grounding and Temporal State-Space Reasoning

[2512.21106] Semantic Refinement with LLMs for Graph Representations

No comments

Stay updated with AI News