[2603.27707] Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes

[2603.27707] Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2603.27707: Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes

Computer Science > Machine Learning arXiv:2603.27707 (cs) [Submitted on 29 Mar 2026] Title:Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes Authors:Ashish Pandey View a PDF of the paper titled Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes, by Ashish Pandey View PDF HTML (experimental) Abstract:Sequential fine-tuning of pretrained language encoders often overwrites previously acquired capabilities, but the forgetting behavior of parameter-efficient updates remains under-characterized. We present a controlled empirical study of Low-Rank Adaptation (LoRA) in sequential transformer encoder fine-tuning with companion representation probes that test a frozen-backbone explanation of its robustness. In five full-validation BERT-base reruns on an RTE->MRPC->CoLA->SST-2 sequence, full fine-tuning yields 19.9%+/-4.8% average forgetting, whereas standard LoRA (r=8, query/value modules) yields 0.6%+/-1.4% (paired t-test, p=0.002, Cohen's d_s=3.12). Task-level analyses confirm this reduction is not merely an aggregate effect. Secondary experiments on RoBERTa-base show the same pattern, and the strongest EWC baseline remains at 15.5%+/-1.4% forgetting. A six-task extension reveals that low average forgetting can hide strong task-level heterogene...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

Machine Learning

[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)

We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video i...

Reddit - Machine Learning · 1 min ·
Machine Learning

FLUX 2 Pro (2026) Sketch to Image

I sketched a cow and tested how different models interpret it into a realistic image for downstream 3D generation, turns out some models ...

Reddit - Artificial Intelligence · 1 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
Machine Learning

[D] TMLR reviews seem more reliable than ICML/NeurIPS/ICLR

This year I submitted a paper to ICML for the first time. I have also experienced the review process at TMLR and ICLR. From my observatio...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime