[2510.13900] Narrow Finetuning Leaves Clearly Readable Traces in

[2510.13900] Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences

arXiv - AI March 04, 2026 4 min read

About this article

Abstract page for arXiv paper 2510.13900: Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences

Computer Science > Computation and Language arXiv:2510.13900 (cs) [Submitted on 14 Oct 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences Authors:Julian Minder, Clément Dumas, Stewart Slocum, Helena Casademunt, Cameron Holmes, Robert West, Neel Nanda View a PDF of the paper titled Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences, by Julian Minder and 6 other authors View PDF HTML (experimental) Abstract:Finetuning on narrow domains has become an essential tool to adapt Large Language Models (LLMs) to specific tasks and to create models with known unusual properties that are useful for research. We show that narrow finetuning creates strong biases in LLM activations that can be interpreted to understand the finetuning domain. These biases can be discovered using simple tools from model diffing - the study of differences between models before and after finetuning. In particular, analyzing activation differences on the first few tokens of random text and steering by adding this difference to the model activations produces text similar to the format and general content of the finetuning data. We demonstrate that these analyses contain crucial information by creating an LLM-based interpretability agent to understand the finetuning domain. With access to the bias, the agent performs significantly better compared to baseline agents using simple prompting. Our analysis sp...

Originally published on March 04, 2026. Curated by AI News.

Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min · 35 minutes ago

Llms

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Abstract page for arXiv paper 2603.15159: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

arXiv - AI · 4 min · 35 minutes ago

Llms

[2602.08316] SWE Context Bench: A Benchmark for Context Learning in Coding

Abstract page for arXiv paper 2602.08316: SWE Context Bench: A Benchmark for Context Learning in Coding

arXiv - AI · 4 min · 35 minutes ago

Llms

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

Abstract page for arXiv paper 2601.13227: Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

arXiv - AI · 3 min · 35 minutes ago

[2510.13900] Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences

About this article

Related Articles

[2603.16629] MLLM-based Textual Explanations for Face Comparison

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

[2602.08316] SWE Context Bench: A Benchmark for Context Learning in Coding

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

No comments

Stay updated with AI News