[2505.16950] Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning
About this article
Abstract page for arXiv paper 2505.16950: Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning
Computer Science > Machine Learning arXiv:2505.16950 (cs) [Submitted on 22 May 2025 (v1), last revised 25 Mar 2026 (this version, v4)] Title:Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning Authors:Adnan Oomerjee, Zafeirios Fountas, Haitham Bou-Ammar, Jun Wang View a PDF of the paper titled Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning, by Adnan Oomerjee and 3 other authors View PDF HTML (experimental) Abstract:Transformer LLMs have been shown to exhibit strong reasoning ability that scales with inference-time compute, most prominently through token-space "thinking" chains of thought. A growing line of work pushes extra computation into the model's latent space, which we term Auxiliary Latent-Space Computation (ALSC). Existing ALSC methods largely fall into three buckets: (i) token-mediated latent rollouts, (ii) residual/activation steering, and (iii) memory (KV) compression. An underexplored alternative is memory consolidation/reconsolidation, two processes in the brain that are responsible for stabilising newly formed memory traces, and, upon recall, transiently rendering established traces plastic such they can integrate new contextual information before restabilising. In Transformer LLMs, this can be seen as analogous to performing in-place rewrites of new KV segments, and rewrites of recalled past segments. In this work, we give a theoretical justification as to why memory (re)consolidation via...