[2602.14078] Policy Gradient with Adaptive Entropy Annealing for Continual Fine-Tuning

[2602.14078] Policy Gradient with Adaptive Entropy Annealing for Continual Fine-Tuning

arXiv - AI 4 min read Article

Summary

This paper presents a novel approach, Adaptive Entropy Annealing (aEPG), to enhance continual fine-tuning of large pretrained vision models by minimizing misclassification error through reinforcement learning techniques.

Why It Matters

The research addresses the critical issue of catastrophic forgetting in machine learning models when adapting to new tasks. By proposing a new training strategy that improves performance across various benchmarks, this work contributes to the ongoing development of more robust AI systems capable of continual learning.

Key Takeaways

  • Introduces aEPG, a method that transitions from exploratory to exploitative learning.
  • Demonstrates that lower entropy in output predictions enhances model adaptation.
  • Outperforms traditional cross-entropy loss methods across diverse benchmarks.
  • Revisits classification as a one-step Markov Decision Process for better performance.
  • Highlights the importance of prioritizing high-confidence samples in training.

Computer Science > Machine Learning arXiv:2602.14078 (cs) [Submitted on 15 Feb 2026] Title:Policy Gradient with Adaptive Entropy Annealing for Continual Fine-Tuning Authors:Yaqian Zhang, Bernhard Pfahringer, Eibe Frank, Albert Bifet View a PDF of the paper titled Policy Gradient with Adaptive Entropy Annealing for Continual Fine-Tuning, by Yaqian Zhang and 3 other authors View PDF HTML (experimental) Abstract:Despite their success, large pretrained vision models remain vulnerable to catastrophic forgetting when adapted to new tasks in class-incremental settings. Parameter-efficient fine-tuning (PEFT) alleviates this by restricting trainable parameters, yet most approaches still rely on cross-entropy (CE) loss, a surrogate for the 0-1 loss, to learn from new data. We revisit this choice and revive the true objective (0-1 loss) through a reinforcement learning perspective. By formulating classification as a one-step Markov Decision Process, we derive an Expected Policy Gradient (EPG) method that directly minimizes misclassification error with a low-variance gradient estimation. Our analysis shows that CE can be interpreted as EPG with an additional sample-weighting mechanism: CE encourages exploration by emphasizing low-confidence samples, while EPG prioritizes high-confidence ones. Building on this insight, we propose adaptive entropy annealing (aEPG), a training strategy that transitions from exploratory (CE-like) to exploitative (EPG-like) learning. aEPG-based methods out...

Related Articles

Machine Learning

[R] Fine-tuning services report

If you have some data and want to train or run a small custom model but don't have powerful enough hardware for training, fine-tuning ser...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Does ML have a "bible"/reference textbook at the Intermediate/Advanced level?

Hello, everyone! This is my first time posting here and I apologise if the question is, perhaps, a bit too basic for this sub-reddit. A b...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML 2026 review policy debate: 100 responses suggest Policy B may score higher, while Policy A shows higher confidence

A week ago I made a thread asking whether ICML 2026’s review policy might have affected review outcomes, especially whether Policy A pape...

Reddit - Machine Learning · 1 min ·
Nomadic raises $8.4 million to wrangle the data pouring off autonomous vehicles | TechCrunch
Machine Learning

Nomadic raises $8.4 million to wrangle the data pouring off autonomous vehicles | TechCrunch

The company turns footage from robots into structured, searchable datasets with a deep learning model.

TechCrunch - AI · 6 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime