[2604.05248] Improving Sparse Memory Finetuning
About this article
Abstract page for arXiv paper 2604.05248: Improving Sparse Memory Finetuning
Computer Science > Machine Learning arXiv:2604.05248 (cs) [Submitted on 6 Apr 2026] Title:Improving Sparse Memory Finetuning Authors:Satyam Goyal, Anirudh Kanchi, Garv Shah, Prakhar Gupta View a PDF of the paper titled Improving Sparse Memory Finetuning, by Satyam Goyal and 3 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) are typically static after training, yet real-world applications require continual adaptation to new knowledge without degrading existing capabilities. Standard approaches to updating models, like full finetuning or parameter-efficient methods (e.g., LoRA), face a fundamental trade-off: catastrophic forgetting. They modify shared dense representations, causing interference across tasks. Sparse Memory Finetuning (SMF) offers a promising alternative by localizing updates to a small subset of parameters in explicit memory layers. In this work, we present an open-source pipeline to retrofit existing pretrained models (Qwen-2.5-0.5B) with sparse memory modules, enabling effective continual learning on consumer hardware. We extend prior work by introducing a theoretically grounded slot-selection mechanism based on Kullback-Leibler (KL) divergence, which prioritizes memory updates for informationally "surprising" tokens relative to a background distribution. Our experiments demonstrate that our retrofitted models can acquire new factual knowledge with minimal forgetting of held-out capabilities, validating the sparse update hypo...