[2406.03862] Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
Summary
This paper explores behavior-targeted attacks on reinforcement learning systems and proposes a novel defense strategy using time-discounted regularization to enhance robustness against such adversarial manipulations.
Why It Matters
As reinforcement learning systems are increasingly deployed in critical applications, understanding and mitigating adversarial behavior manipulation is essential. This research offers a pioneering defense mechanism that can improve the security and reliability of these systems, highlighting the importance of robust AI in real-world scenarios.
Key Takeaways
- Behavior-targeted attacks can manipulate reinforcement learning systems through adversarial interventions.
- Existing attacks often require white-box access, limiting their applicability.
- The proposed imitation learning method allows for effective attacks with limited policy access.
- Time-discounted regularization enhances defense against these attacks while maintaining performance.
- This study introduces the first defense strategy specifically designed for behavior-targeted attacks.
Computer Science > Machine Learning arXiv:2406.03862 (cs) [Submitted on 6 Jun 2024 (v1), last revised 17 Feb 2026 (this version, v3)] Title:Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation Authors:Shojiro Yamabe, Kazuto Fukuchi, Jun Sakuma View a PDF of the paper titled Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation, by Shojiro Yamabe and 2 other authors View PDF HTML (experimental) Abstract:This study investigates behavior-targeted attacks on reinforcement learning and their countermeasures. Behavior-targeted attacks aim to manipulate the victim's behavior as desired by the adversary through adversarial interventions in state observations. Existing behavior-targeted attacks have some limitations, such as requiring white-box access to the victim's policy. To address this, we propose a novel attack method using imitation learning from adversarial demonstrations, which works under limited access to the victim's policy and is environment-agnostic. In addition, our theoretical analysis proves that the policy's sensitivity to state changes impacts defense performance, particularly in the early stages of the trajectory. Based on this insight, we propose time-discounted regularization, which enhances robustness against attacks while maintaining task performance. To the best of our knowledge, this is the first defense strategy specifically designed for behavior-targeted attacks. Comments: Subjects: Machine Learning (cs.LG);...