[2602.23197] Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

[2602.23197] Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

arXiv - Machine Learning 3 min read Article

Summary

This paper explores the impact of fine-tuning on in-context learning in linear attention models, revealing conditions that can enhance or degrade performance on downstream tasks.

Why It Matters

Understanding how fine-tuning affects in-context learning is crucial for optimizing large language models. This research provides theoretical insights that can guide practitioners in maintaining model performance across various tasks, which is essential in the rapidly evolving field of AI.

Key Takeaways

  • Fine-tuning can degrade in-context learning performance if not managed properly.
  • Restricting updates to the value matrix during fine-tuning can preserve in-context learning.
  • Incorporating auxiliary few-shot loss can enhance performance on target tasks but may harm generalization.

Computer Science > Computation and Language arXiv:2602.23197 (cs) [Submitted on 26 Feb 2026] Title:Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models Authors:Chungpa Lee, Jy-yong Sohn, Kangwook Lee View a PDF of the paper titled Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models, by Chungpa Lee and 2 other authors View PDF HTML (experimental) Abstract:Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations. In practice, such models are often fine-tuned to improve zero-shot performance on downstream tasks, allowing them to solve tasks without examples and thereby reducing inference costs. However, fine-tuning can degrade in-context learning, limiting the performance of fine-tuned models on tasks not seen during fine-tuning. Using linear attention models, we provide a theoretical analysis that characterizes how fine-tuning objectives modify attention parameters and identifies conditions under which this leads to degraded few-shot performance. We show that fine-tuning all attention parameters can harm in-context learning, whereas restricting updates to the value matrix improves zero-shot performance while preserving in-context learning. We further show that incorporating an auxiliary few-shot loss enhances in-context learning primarily on the target task, at the expense of degraded...

Related Articles

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED
Llms

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Ads are rolling out across the US on ChatGPT’s free tier. I asked OpenAI's bot 500 questions to see what these ads were like and how they...

Wired - AI · 9 min ·
Llms

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

Three days ago, I clicked the "Deploy OpenClaw In Seconds" button to get an overview of the new service, but I didn't build any automatio...

Reddit - Artificial Intelligence · 1 min ·
Google’s Gemini AI app debuts in Hong Kong
Llms

Google’s Gemini AI app debuts in Hong Kong

Tech giant’s chatbot service tops Apple’s app store chart in the city.

AI Tools & Products · 2 min ·
Google Launches Gemini Import Tools to Poach Users From Rival AI Apps
Llms

Google Launches Gemini Import Tools to Poach Users From Rival AI Apps

Anyone looking to switch their AI assistant will find it surprisingly easy, as it only takes a few steps to move from A to B. This is not...

AI Tools & Products · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime