[2602.15902] Doc-to-LoRA: Learning to Instantly Internalize Contexts

[2602.15902] Doc-to-LoRA: Learning to Instantly Internalize Contexts

arXiv - AI 3 min read Article

Summary

The paper presents Doc-to-LoRA, a hypernetwork that enables Large Language Models to internalize contexts efficiently, reducing memory usage and latency during inference.

Why It Matters

As LLMs become integral to various applications, optimizing their performance is crucial. Doc-to-LoRA addresses the challenges of long input sequences and high memory costs, making LLMs more efficient and adaptable for real-world tasks.

Key Takeaways

  • Doc-to-LoRA enables context internalization in a single forward pass.
  • It significantly reduces latency and memory consumption during inference.
  • The method outperforms traditional context distillation in real-world QA tasks.
  • D2L allows for rapid adaptation of LLMs, enhancing personalized interactions.
  • The approach supports sequence lengths beyond the native context window of LLMs.

Computer Science > Computation and Language arXiv:2602.15902 (cs) [Submitted on 13 Feb 2026] Title:Doc-to-LoRA: Learning to Instantly Internalize Contexts Authors:Rujikorn Charakorn, Edoardo Cetin, Shinnosuke Uesaka, Robert Tjarko Lange View a PDF of the paper titled Doc-to-LoRA: Learning to Instantly Internalize Contexts, by Rujikorn Charakorn and 3 other authors View PDF Abstract:Long input sequences are central to in-context learning, document understanding, and multi-step reasoning of Large Language Models (LLMs). However, the quadratic attention cost of Transformers makes inference memory-intensive and slow. While context distillation (CD) can transfer information into model parameters, per-prompt distillation is impractical due to training costs and latency. To address these limitations, we propose Doc-to-LoRA (D2L), a lightweight hypernetwork that meta-learns to perform approximate CD within a single forward pass. Given an unseen prompt, D2L generates a LoRA adapter for a target LLM, enabling subsequent queries to be answered without re-consuming the original context, reducing latency and KV-cache memory consumption during inference of the target LLM. On a long-context needle-in-a-haystack task, D2L successfully learns to map contexts into adapters that store the needle information, achieving near-perfect zero-shot accuracy at sequence lengths exceeding the target LLM's native context window by more than 4x. On real-world QA datasets with limited compute, D2L outper...

Related Articles

Llms

OpenClaw security checklist: practical safeguards for AI agents

Here is one of the better quality guides on the ensuring safety when deploying OpenClaw: https://chatgptguide.ai/openclaw-security-checkl...

Reddit - Artificial Intelligence · 1 min ·
I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge
Llms

I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge

Gemini in Google Maps is a surprisingly useful way to explore new territory.

The Verge - AI · 11 min ·
Llms

The person who replaces you probably won't be AI. It'll be someone from the next department over who learned to use it - opinion/discussion

I'm a strategy person by background. Two years ago I'd write a recommendation and hand it to a product team. Now.. I describe what I want...

Reddit - Artificial Intelligence · 1 min ·
Block Resets Management With AI As Cash App Adds Installment Transfers
Llms

Block Resets Management With AI As Cash App Adds Installment Transfers

Block (NYSE:XYZ) plans a permanent organizational overhaul that replaces many middle management roles with AI-driven models to create fla...

AI Tools & Products · 5 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime