What to expect from AlphaZero's value predictions [D]
About this article
An AlphaZero agent has learnt to predict the value of a game state by training on data generated by self-play by the model and a series of predecessor models. By construction, this value should reflect the probability of winning against a copy of itself starting from the given state. To be more precise, the value measures the state's average strength against opponent players collected among all the predecessors of the current model. This average depends on the manner in which the training dat...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket