[2603.22329] Trained Persistent Memory for Frozen Decoder-Only LLMs
About this article
Abstract page for arXiv paper 2603.22329: Trained Persistent Memory for Frozen Decoder-Only LLMs
Computer Science > Machine Learning arXiv:2603.22329 (cs) [Submitted on 20 Mar 2026] Title:Trained Persistent Memory for Frozen Decoder-Only LLMs Authors:Hong Jeong View a PDF of the paper titled Trained Persistent Memory for Frozen Decoder-Only LLMs, by Hong Jeong View PDF HTML (experimental) Abstract:Decoder-only language models are stateless: hidden representations are discarded after every forward pass and nothing persists across sessions. Jeong (2026a) showed that trained memory adapters give a frozen encoder-decoder backbone persistent latent-space memory, building on the lateral-memory framework of Jeong (2026b,c). Here we ask whether the same principle transfers to the decoder-only setting, where no cross-attention pathway exists and memory must enter through self-attention alone. We adapt six methods -- prefix, parallel cross-attention, KV extension, Hebbian memory, context-gated branch, and slot-based sparse write -- to a frozen GPT-2, training only a small adapter $\theta_{mem}$. The write rule is shared; only the read injection changes from decoder cross-attention to self-attention KV prefix or parallel branch. On LoCoMo we find a striking inductive-bias dichotomy: at $1\times$ capacity, three methods with strong architectural priors -- cross-attention (M.2), Hebbian (M.4), and slot write (M.6) -- achieve retained-memory scores of $7-18\%$ and knowledge gains $\Delta K$ of $7-10$, while the other three fail ($< 0.4\%$). At $10\times$ capacity all six converge, ...