[2603.14867] Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning
About this article
Abstract page for arXiv paper 2603.14867: Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning
Computer Science > Machine Learning arXiv:2603.14867 (cs) [Submitted on 16 Mar 2026 (v1), last revised 25 Mar 2026 (this version, v2)] Title:Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning Authors:Mikoto Kudo, Takumi Tanabe, Akifumi Wachi, Youhei Akimoto View a PDF of the paper titled Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning, by Mikoto Kudo and 3 other authors View PDF HTML (experimental) Abstract:Many strategic decision-making problems, such as environment design for warehouse robots, can be naturally formulated as bi-level reinforcement learning (RL), where a leader agent optimizes its objective while a follower solves a Markov decision process (MDP) conditioned on the leader's decisions. In many situations, a fundamental challenge arises when the leader cannot intervene in the follower's optimization process; it can only observe the optimization outcome. We address this decentralized setting by deriving the hypergradient of the leader's objective, i.e., the gradient of the leader's strategy that accounts for changes in the follower's optimal policy. Unlike prior hypergradient-based methods that require extensive data for repeated state visits or rely on gradient estimators whose complexity can increase substantially with the high-dimensional leader's decision space, we leverage the Boltzmann covariance trick to derive an alternative hypergradient formulation. This enables effi...