[2603.04790] Diffusion Policy through Conditional Proximal Policy Optimization
About this article
Abstract page for arXiv paper 2603.04790: Diffusion Policy through Conditional Proximal Policy Optimization
Computer Science > Machine Learning arXiv:2603.04790 (cs) [Submitted on 5 Mar 2026] Title:Diffusion Policy through Conditional Proximal Policy Optimization Authors:Ben Liu, Shunpeng Yang, Hua Chen View a PDF of the paper titled Diffusion Policy through Conditional Proximal Policy Optimization, by Ben Liu and Shunpeng Yang and Hua Chen View PDF HTML (experimental) Abstract:Reinforcement learning (RL) has been extensively employed in a wide range of decision-making problems, such as games and robotics. Recently, diffusion policies have shown strong potential in modeling multi-modal behaviors, enabling more diverse and flexible action generation compared to the conventional Gaussian policy. Despite various attempts to combine RL with diffusion, a key challenge is the difficulty of computing action log-likelihood under the diffusion model. This greatly hinders the direct application of diffusion policies in on-policy reinforcement learning. Most existing methods calculate or approximate the log-likelihood through the entire denoising process in the diffusion model, which can be memory- and computationally inefficient. To overcome this challenge, we propose a novel and efficient method to train a diffusion policy in an on-policy setting that requires only evaluating a simple Gaussian probability. This is achieved by aligning the policy iteration with the diffusion process, which is a distinct paradigm compared to previous work. Moreover, our formulation can naturally handle ent...