[2510.03215] Cache-to-Cache: Direct Semantic Communication Between

[2510.03215] Cache-to-Cache: Direct Semantic Communication Between Large Language Models

arXiv - Machine Learning March 04, 2026 4 min read

About this article

Abstract page for arXiv paper 2510.03215: Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Computer Science > Computation and Language arXiv:2510.03215 (cs) [Submitted on 3 Oct 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:Cache-to-Cache: Direct Semantic Communication Between Large Language Models Authors:Tianyu Fu, Zihan Min, Hanling Zhang, Jichao Yan, Guohao Dai, Wanli Ouyang, Yu Wang View a PDF of the paper titled Cache-to-Cache: Direct Semantic Communication Between Large Language Models, by Tianyu Fu and 6 other authors View PDF HTML (experimental) Abstract:Multi-LLM systems harness the complementary strengths of diverse Large Language Models, achieving performance and efficiency gains that are not attainable by a single model. In existing designs, LLMs communicate through text, forcing internal representations to be transformed into output token sequences. This process both loses rich semantic information and incurs token-by-token generation latency. Motivated by these limitations, we ask: Can LLMs communicate beyond text? Oracle experiments show that enriching the KV-Cache semantics can improve response quality without increasing cache size, supporting KV-Cache as an effective medium for inter-model communication. Thus, we propose Cache-to-Cache (C2C), a new paradigm for direct semantic communication between LLMs. C2C uses a neural network to project and fuse the source model's KV-cache with that of the target model to enable direct semantic transfer. A learnable gating mechanism selects the target layers that benefit from cache communicati...

Originally published on March 04, 2026. Curated by AI News.

Llms

[2603.18532] Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds

Abstract page for arXiv paper 2603.18532: Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds

arXiv - AI · 4 min · 43 minutes ago

Llms

[2603.12702] FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning

Abstract page for arXiv paper 2603.12702: FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning

arXiv - Machine Learning · 4 min · 43 minutes ago

Llms

[2603.12681] Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment

Abstract page for arXiv paper 2603.12681: Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment

arXiv - Machine Learning · 3 min · 43 minutes ago

Llms

[2602.06098] A Theoretical Analysis of Test-Driven LLM Code Generation

Abstract page for arXiv paper 2602.06098: A Theoretical Analysis of Test-Driven LLM Code Generation

arXiv - AI · 3 min · 43 minutes ago

[2510.03215] Cache-to-Cache: Direct Semantic Communication Between Large Language Models

About this article

Related Articles

[2603.18532] Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds

[2603.12702] FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning

[2603.12681] Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment

[2602.06098] A Theoretical Analysis of Test-Driven LLM Code Generation

No comments

Stay updated with AI News