[2603.24139] Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection
About this article
Abstract page for arXiv paper 2603.24139: Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.24139 (cs) [Submitted on 25 Mar 2026] Title:Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection Authors:Zhanhe Lei, Zhongyuan Wang, Jikang Cheng, Baojin Huang, Yuhong Yang, Zhen Han, Chao Liang, Dengpan Ye View a PDF of the paper titled Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection, by Zhanhe Lei and 7 other authors View PDF HTML (experimental) Abstract:Standard supervised training for deepfake detection treats all samples with uniform importance, which can be suboptimal for learning robust and generalizable features. In this work, we propose a novel Tutor-Student Reinforcement Learning (TSRL) framework to dynamically optimize the training curriculum. Our method models the training process as a Markov Decision Process where a ``Tutor'' agent learns to guide a ``Student'' (the deepfake detector). The Tutor, implemented as a Proximal Policy Optimization (PPO) agent, observes a rich state representation for each training sample, encapsulating not only its visual features but also its historical learning dynamics, such as EMA loss and forgetting counts. Based on this state, the Tutor takes an action by assigning a continuous weight (0-1) to the sample's loss, thereby dynamically re-weighting the training batch. The Tutor is rewarded based on the Student's immediate performance change, specifically rewarding transitions from incor...