[2603.24139] Tutor-Student Reinforcement Learning: A Dynamic

[2603.24139] Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection

arXiv - Machine Learning March 26, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.24139: Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.24139 (cs) [Submitted on 25 Mar 2026] Title:Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection Authors:Zhanhe Lei, Zhongyuan Wang, Jikang Cheng, Baojin Huang, Yuhong Yang, Zhen Han, Chao Liang, Dengpan Ye View a PDF of the paper titled Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection, by Zhanhe Lei and 7 other authors View PDF HTML (experimental) Abstract:Standard supervised training for deepfake detection treats all samples with uniform importance, which can be suboptimal for learning robust and generalizable features. In this work, we propose a novel Tutor-Student Reinforcement Learning (TSRL) framework to dynamically optimize the training curriculum. Our method models the training process as a Markov Decision Process where a ``Tutor'' agent learns to guide a ``Student'' (the deepfake detector). The Tutor, implemented as a Proximal Policy Optimization (PPO) agent, observes a rich state representation for each training sample, encapsulating not only its visual features but also its historical learning dynamics, such as EMA loss and forgetting counts. Based on this state, the Tutor takes an action by assigning a continuous weight (0-1) to the sample's loss, thereby dynamically re-weighting the training batch. The Tutor is rewarded based on the Student's immediate performance change, specifically rewarding transitions from incor...

Originally published on March 26, 2026. Curated by AI News.

Llms

[R] Depth-first pruning transfers: GPT-2 → TinyLlama with stable gains and minimal loss

TL;DR: Removing the right layers (instead of shrinking all layers) makes transformer models ~8–12% smaller with only ~6–8% quality loss, ...

Reddit - Machine Learning · 1 min · 14 minutes ago

Llms

Built a training stability monitor that detects instability before your loss curve shows anything — open sourced the core today

Been working on a weight divergence trajectory curvature approach to detecting neural network training instability. Treats weight updates...

Reddit - Artificial Intelligence · 1 min · 14 minutes ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · 17 minutes ago

Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min · 17 minutes ago

[2603.24139] Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection

About this article

Related Articles

[R] Depth-first pruning transfers: GPT-2 → TinyLlama with stable gains and minimal loss

Built a training stability monitor that detects instability before your loss curve shows anything — open sourced the core today

UMKC Announces New Master of Science in Artificial Intelligence

Improving AI models’ ability to explain their predictions

No comments

Stay updated with AI News