[2510.03215] Cache-to-Cache: Direct Semantic Communication Between Large Language Models
About this article
Abstract page for arXiv paper 2510.03215: Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Computer Science > Computation and Language arXiv:2510.03215 (cs) [Submitted on 3 Oct 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:Cache-to-Cache: Direct Semantic Communication Between Large Language Models Authors:Tianyu Fu, Zihan Min, Hanling Zhang, Jichao Yan, Guohao Dai, Wanli Ouyang, Yu Wang View a PDF of the paper titled Cache-to-Cache: Direct Semantic Communication Between Large Language Models, by Tianyu Fu and 6 other authors View PDF HTML (experimental) Abstract:Multi-LLM systems harness the complementary strengths of diverse Large Language Models, achieving performance and efficiency gains that are not attainable by a single model. In existing designs, LLMs communicate through text, forcing internal representations to be transformed into output token sequences. This process both loses rich semantic information and incurs token-by-token generation latency. Motivated by these limitations, we ask: Can LLMs communicate beyond text? Oracle experiments show that enriching the KV-Cache semantics can improve response quality without increasing cache size, supporting KV-Cache as an effective medium for inter-model communication. Thus, we propose Cache-to-Cache (C2C), a new paradigm for direct semantic communication between LLMs. C2C uses a neural network to project and fuse the source model's KV-cache with that of the target model to enable direct semantic transfer. A learnable gating mechanism selects the target layers that benefit from cache communicati...