[2405.18248] Extreme Value Monte Carlo Tree Search for Classical Planning
About this article
Abstract page for arXiv paper 2405.18248: Extreme Value Monte Carlo Tree Search for Classical Planning
Computer Science > Artificial Intelligence arXiv:2405.18248 (cs) [Submitted on 28 May 2024 (v1), last revised 26 Mar 2026 (this version, v3)] Title:Extreme Value Monte Carlo Tree Search for Classical Planning Authors:Masataro Asai, Stephen Wissow View a PDF of the paper titled Extreme Value Monte Carlo Tree Search for Classical Planning, by Masataro Asai and 1 other authors View PDF HTML (experimental) Abstract:Despite being successful in board games and reinforcement learning (RL), Monte Carlo Tree Search (MCTS) combined with Multi Armed Bandits (MABs) has seen limited success in domain-independent classical planning until recently. Previous work (Wissow and Asai 2024) showed that UCB1, designed for bounded rewards, does not perform well as applied to cost-to-go estimates in classical planning, which are unbounded in $\R$, and showed improved performance using a Gaussian reward MAB instead. This paper further sharpens our understanding of ideal bandits for planning tasks. Existing work has two issues: first, Gaussian MABs under-specify the support of cost-to-go estimates as $(-\infty,\infty)$, which we can narrow down. Second, Full Bellman backup (Schulte and Keller 2014), which backpropagates sample max/min, lacks theoretical justification. We use \emph{Peaks-Over-Threashold Extreme Value Theory} to resolve both issues at once, and propose a new bandit algorithm (UCB1-Uniform). We formally prove its regret bound and empirically demonstrate its performance in classical pl...