[2510.00922] On Discovering Algorithms for Adversarial Imitation Learning
Summary
This paper presents Discovered Adversarial Imitation Learning (DAIL), a novel approach to improving stability in Adversarial Imitation Learning by discovering data-driven reward assignment functions through an LLM-guided evolutionary framework.
Why It Matters
The research addresses a critical gap in Adversarial Imitation Learning by focusing on the often-overlooked role of reward assignment functions. By proposing a method that automates the discovery of these functions, the study enhances the stability and performance of AIL, which is vital for applications in robotics and AI systems where expert demonstrations are limited.
Key Takeaways
- DAIL outperforms traditional human-designed reward assignment methods.
- The study highlights the importance of reward assignment in AIL stability.
- An LLM-guided evolutionary framework is proposed for discovering reward functions.
- DAIL generalizes across various environments and optimization algorithms.
- The research contributes to the understanding of training dynamics in AIL.
Computer Science > Artificial Intelligence arXiv:2510.00922 (cs) [Submitted on 1 Oct 2025 (v1), last revised 26 Feb 2026 (this version, v2)] Title:On Discovering Algorithms for Adversarial Imitation Learning Authors:Shashank Reddy Chirra, Jayden Teoh, Praveen Paruchuri, Pradeep Varakantham View a PDF of the paper titled On Discovering Algorithms for Adversarial Imitation Learning, by Shashank Reddy Chirra and 3 other authors View PDF HTML (experimental) Abstract:Adversarial Imitation Learning (AIL) methods, while effective in settings with limited expert demonstrations, are often considered unstable. These approaches typically decompose into two components: Density Ratio (DR) estimation $\frac{\rho_E}{\rho_{\pi}}$, where a discriminator estimates the relative occupancy of state-action pairs under the policy versus the expert; and Reward Assignment (RA), where this ratio is transformed into a reward signal used to train the policy. While significant research has focused on improving density estimation, the role of reward assignment in influencing training dynamics and final policy performance has been largely overlooked. RA functions in AIL are typically derived from divergence minimization objectives, relying heavily on human design and ingenuity. In this work, we take a different approach: we investigate the discovery of data-driven RA functions, i.e, based directly on the performance of the resulting imitation policy. To this end, we leverage an LLM-guided evolutionary f...