Machine Learning Llms Ai Agents

[2602.20904] Transcoder Adapters for Reasoning-Model Diffing

arXiv - Machine Learning February 25, 2026 4 min read Article

Summary

This paper introduces transcoder adapters, a method for analyzing the internal changes in reasoning models post fine-tuning, demonstrating their effectiveness in capturing reasoning behaviors.

Why It Matters

Understanding the internal mechanisms of reasoning models is crucial for improving AI performance. This research provides insights into how fine-tuning affects model behavior, which can guide future developments in machine learning and AI applications.

Key Takeaways

Transcoder adapters help interpret changes in reasoning models after fine-tuning.
The study reveals that only a small percentage of adapter features are related to reasoning behaviors.
Hesitation tokens in responses can be traced to specific adapter features, highlighting their role in model outputs.
Adapters can recover a significant portion of accuracy gains from reasoning fine-tuning.
The findings suggest broader applications for transcoder adapters in studying model fine-tuning.

Computer Science > Machine Learning arXiv:2602.20904 (cs) [Submitted on 24 Feb 2026] Title:Transcoder Adapters for Reasoning-Model Diffing Authors:Nathan Hu, Jake Ward, Thomas Icard, Christopher Potts View a PDF of the paper titled Transcoder Adapters for Reasoning-Model Diffing, by Nathan Hu and 3 other authors View PDF HTML (experimental) Abstract:While reasoning models are increasingly ubiquitous, the effects of reasoning training on a model's internal mechanisms remain poorly understood. In this work, we introduce transcoder adapters, a technique for learning an interpretable approximation of the difference in MLP computation before and after fine-tuning. We apply transcoder adapters to characterize the differences between Qwen2.5-Math-7B and its reasoning-distilled variant, DeepSeek-R1-Distill-Qwen-7B. Learned adapters are faithful to the target model's internal computation and next-token predictions. When evaluated on reasoning benchmarks, adapters match the reasoning model's response lengths and typically recover 50-90% of the accuracy gains from reasoning fine-tuning. Adapter features are sparsely activating and interpretable. When examining adapter features, we find that only ~8% have activating examples directly related to reasoning behaviors. We deeply study one such behavior -- the production of hesitation tokens (e.g., "wait"). Using attribution graphs, we trace hesitation to only ~2.4% of adapter features (5.6k total) performing one of two functions. These fe...

Read Original Article

[2602.20904] Transcoder Adapters for Reasoning-Model Diffing

Summary

Why It Matters

Key Takeaways

Related Articles

[P] Remote sensing foundation models made easy to use.

Can AI truly be creative?

AI video generation seems fundamentally more expensive than text, not just less optimized

[D] When to transition from simple heuristics to ML models (e.g., DensityFunction)?

No comments

Stay updated with AI News