Target page 2

OpenAI News March 20, 2018 07:00

Variance reduction for policy gradient with action-dependent factorized baselines

Policy gradient methods have enjoyed great success in deep reinforcement learning but suffer from high variance of gradient estimates. The high variance problem is particularly exasperated in problems with long horizons or high-dimensional action spaces. To...

Policy

Policy Target

OpenAI News April 21, 2017 07:00

Equivalence between policy gradients and soft Q-learning

Two of the leading approaches for model-free reinforcement learning are policy gradient methods and Q-learning methods. Q-learning methods can be effective and sample-efficient when they work, however, it is not well-understood why they work, since...

Policy

Policy Target