[2603.04759] Stacked from One: Multi-Scale Self-Injection for Context

[2603.04759] Stacked from One: Multi-Scale Self-Injection for Context Window Extension

arXiv - AI March 06, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.04759: Stacked from One: Multi-Scale Self-Injection for Context Window Extension

Computer Science > Computation and Language arXiv:2603.04759 (cs) [Submitted on 5 Mar 2026] Title:Stacked from One: Multi-Scale Self-Injection for Context Window Extension Authors:Wei Han, Pan Zhou, Shuicheng Yan View a PDF of the paper titled Stacked from One: Multi-Scale Self-Injection for Context Window Extension, by Wei Han and Pan Zhou and Shuicheng Yan View PDF HTML (experimental) Abstract:The limited context window of contemporary large language models (LLMs) remains a primary bottleneck for their broader application across diverse domains. Although continual pre-training on long-context data offers a straightforward solution, it incurs prohibitive data acquisition and computational costs. To address this challenge, we propose~\modelname, a novel framework based on multi-grained context compression and query-aware information acquisition. SharedLLM comprises two stacked short-context LLMs: a lower model serving as a compressor and an upper model acting as a decoder. The lower model compresses long inputs into compact, multi-grained representations, which are then forwarded to the upper model for context-aware processing. To maximize efficiency, this information transfer occurs exclusively at the lowest layers, bypassing lengthy forward passes and redundant cross-attention operations. This entire process, wherein the upper and lower models are derived from the same underlying LLM layers, is termed~\textit{self-injection}. To support this architecture, a specialized t...

Originally published on March 06, 2026. Curated by AI News.

Llms

One of The Worst AI's I've Ever Seen

I'm using Gemini just for they gave us a student-free-pro pack. It can't see the images I sent, most of the time it just rewrites the mes...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone 👋 I've set up a self-hosted API gateway using New-API to manage and distribute Claude Opus 4.6 access across multiple users....

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

The open-source AI system that beat Claude Sonnet on a $500 GPU just shipped a coding assistant

A week or two ago, an open-source project called ATLAS made the rounds for scoring 74.6% on LiveCodeBench with a frozen 9B model on a sin...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Claude Max 20x usage hit 40% by Monday noon — how does Codex CLI compare?

I'm on Claude Max (the $100/mo plan) and noticed something that surprised me. By Monday noon I had already used 40% of the 20x monthly li...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

[2603.04759] Stacked from One: Multi-Scale Self-Injection for Context Window Extension

About this article

Related Articles

One of The Worst AI's I've Ever Seen

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

The open-source AI system that beat Claude Sonnet on a $500 GPU just shipped a coding assistant

Claude Max 20x usage hit 40% by Monday noon — how does Codex CLI compare?

No comments

Stay updated with AI News