[2502.14762] Unlocking [CLS] Features for Continual Post-Training
Summary
The paper presents a novel approach to continual learning in machine learning models, introducing a parameter-efficient fine-tuning module, LuCA, and a token-level adaptation method, TOSCA, which balances stability and plasticity while enhancing performance.
Why It Matters
This research addresses the critical challenge of continual learning, where models must adapt to new tasks without forgetting previous knowledge. The proposed methods could significantly improve the efficiency and effectiveness of machine learning applications in dynamic environments, making it relevant for both academic research and practical implementations in AI.
Key Takeaways
- Introduces LuCA, a fine-tuning module for task-specific knowledge acquisition.
- Presents TOSCA, which adapts models at the token level to maintain generalization.
- Achieves state-of-the-art performance with significantly fewer parameters.
- Addresses the stability-plasticity trade-off in continual learning.
- Reduces training and inference complexity in machine learning models.
Computer Science > Machine Learning arXiv:2502.14762 (cs) [Submitted on 20 Feb 2025 (v1), last revised 19 Feb 2026 (this version, v2)] Title:Unlocking [CLS] Features for Continual Post-Training Authors:Murat Onur Yildirim, Elif Ceren Gok Yildirim, Joaquin Vanschoren View a PDF of the paper titled Unlocking [CLS] Features for Continual Post-Training, by Murat Onur Yildirim and 2 other authors View PDF Abstract:Continual learning requires models to integrate new classes or domains over time while preserving previously acquired knowledge. Within this paradigm, foundation models often achieve strong performance, but they still remain subject to the stability-plasticity trade-off, where excessive plasticity leads to forgetting of prior knowledge, and excessive stability constrains the adaptation. This necessitates an effective post-training strategy that introduces minimal yet functional modifications. To address this challenge, we first introduce a new parameter-efficient fine-tuning module 'Learn and Calibrate', or LuCA, designed to acquire task-specific knowledge through an adapter-calibrator couple, enabling well-refined feature representations. Then, for each task, we deploy a sparse LuCA module on top of the last classification token [CLS] just before the classifier, which we refer to as 'Token-level Sparse Calibration and Adaptation', or TOSCA. By leaving the generalization capabilities of the foundation models intact and adapting exclusively via the last token, our appr...