[2603.27707] Low-Rank Adaptation Reduces Catastrophic Forgetting in

[2603.27707] Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes

arXiv - Machine Learning March 31, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.27707: Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes

Computer Science > Machine Learning arXiv:2603.27707 (cs) [Submitted on 29 Mar 2026] Title:Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes Authors:Ashish Pandey View a PDF of the paper titled Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes, by Ashish Pandey View PDF HTML (experimental) Abstract:Sequential fine-tuning of pretrained language encoders often overwrites previously acquired capabilities, but the forgetting behavior of parameter-efficient updates remains under-characterized. We present a controlled empirical study of Low-Rank Adaptation (LoRA) in sequential transformer encoder fine-tuning with companion representation probes that test a frozen-backbone explanation of its robustness. In five full-validation BERT-base reruns on an RTE->MRPC->CoLA->SST-2 sequence, full fine-tuning yields 19.9%+/-4.8% average forgetting, whereas standard LoRA (r=8, query/value modules) yields 0.6%+/-1.4% (paired t-test, p=0.002, Cohen's d_s=3.12). Task-level analyses confirm this reduction is not merely an aggregate effect. Secondary experiments on RoBERTa-base show the same pattern, and the strongest EWC baseline remains at 15.5%+/-1.4% forgetting. A six-task extension reveals that low average forgetting can hide strong task-level heterogene...

Originally published on March 31, 2026. Curated by AI News.

Machine Learning

[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)

We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video i...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

FLUX 2 Pro (2026) Sketch to Image

I sketched a cow and tested how different models interpret it into a realistic image for downstream 3D generation, turns out some models ...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min · about 4 hours ago

Machine Learning

[D] TMLR reviews seem more reliable than ICML/NeurIPS/ICLR

This year I submitted a paper to ICML for the first time. I have also experienced the review process at TMLR and ICLR. From my observatio...

Reddit - Machine Learning · 1 min · about 5 hours ago

[2603.27707] Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes

About this article

Related Articles

[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)

FLUX 2 Pro (2026) Sketch to Image

Improving AI models’ ability to explain their predictions

[D] TMLR reviews seem more reliable than ICML/NeurIPS/ICLR

No comments

Stay updated with AI News