[2502.14762] Unlocking [CLS] Features for Continual Post-Training

[2502.14762] Unlocking [CLS] Features for Continual Post-Training

arXiv - Machine Learning 4 min read Article

Summary

The paper presents a novel approach to continual learning in machine learning models, introducing a parameter-efficient fine-tuning module, LuCA, and a token-level adaptation method, TOSCA, which balances stability and plasticity while enhancing performance.

Why It Matters

This research addresses the critical challenge of continual learning, where models must adapt to new tasks without forgetting previous knowledge. The proposed methods could significantly improve the efficiency and effectiveness of machine learning applications in dynamic environments, making it relevant for both academic research and practical implementations in AI.

Key Takeaways

  • Introduces LuCA, a fine-tuning module for task-specific knowledge acquisition.
  • Presents TOSCA, which adapts models at the token level to maintain generalization.
  • Achieves state-of-the-art performance with significantly fewer parameters.
  • Addresses the stability-plasticity trade-off in continual learning.
  • Reduces training and inference complexity in machine learning models.

Computer Science > Machine Learning arXiv:2502.14762 (cs) [Submitted on 20 Feb 2025 (v1), last revised 19 Feb 2026 (this version, v2)] Title:Unlocking [CLS] Features for Continual Post-Training Authors:Murat Onur Yildirim, Elif Ceren Gok Yildirim, Joaquin Vanschoren View a PDF of the paper titled Unlocking [CLS] Features for Continual Post-Training, by Murat Onur Yildirim and 2 other authors View PDF Abstract:Continual learning requires models to integrate new classes or domains over time while preserving previously acquired knowledge. Within this paradigm, foundation models often achieve strong performance, but they still remain subject to the stability-plasticity trade-off, where excessive plasticity leads to forgetting of prior knowledge, and excessive stability constrains the adaptation. This necessitates an effective post-training strategy that introduces minimal yet functional modifications. To address this challenge, we first introduce a new parameter-efficient fine-tuning module 'Learn and Calibrate', or LuCA, designed to acquire task-specific knowledge through an adapter-calibrator couple, enabling well-refined feature representations. Then, for each task, we deploy a sparse LuCA module on top of the last classification token [CLS] just before the classifier, which we refer to as 'Token-level Sparse Calibration and Adaptation', or TOSCA. By leaving the generalization capabilities of the foundation models intact and adapting exclusively via the last token, our appr...

Related Articles

Llms

People anxious about deviating from what AI tells them to do?

My friend came over yesterday to dye her hair. She had asked ChatGPT for the 'correct' way to do it. Chat told her to dye the ends first,...

Reddit - Artificial Intelligence · 1 min ·
Llms

ChatGPT on trial: A landmark test of AI liability in the practice of law

AI Tools & Products ·
Llms

What if Claude purposefully made its own code leakable so that it would get leaked

What if Claude leaked itself by socially and architecturally engineering itself to be leaked by a dumb human submitted by /u/smurfcsgoawp...

Reddit - Artificial Intelligence · 1 min ·
Llms

Observer-Embedded Reality

Observer-Embedded Reality Consciousness, Complexity, Meaning, and the Limits of Human Knowledge A Conceptual Philosophy-of-Science Paper ...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime