[2503.24378] ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning
About this article
Abstract page for arXiv paper 2503.24378: ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning
Computer Science > Artificial Intelligence arXiv:2503.24378 (cs) [Submitted on 31 Mar 2025 (v1), last revised 27 Feb 2026 (this version, v2)] Title:ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning Authors:Harsha Kokel, Michael Katz, Kavitha Srinivas, Shirin Sohrabi View a PDF of the paper titled ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning, by Harsha Kokel and 3 other authors View PDF HTML (experimental) Abstract:The ACPBench dataset provides atomic reasoning tasks required for efficient planning. The dataset is aimed at distilling the complex plan generation task into separate atomic reasoning tasks in their easiest possible form, boolean or multiple-choice questions, where the model has to choose the right answer from the provided options. While the aim of ACPBench is to test the simplest form of reasoning about action and change, when tasked with planning, a model does not typically have options to choose from and thus the reasoning required for planning dictates an open-ended, generative form for these tasks. To that end, we introduce ACPBench Hard, a generative version of ACPBench, with open-ended questions which the model needs to answer. Models that perform well on these tasks could in principle be integrated into a planner or be used directly as a policy. We discuss the complexity of these tasks as well as the complexity of validating the correctness of their answers and present validation algorithms for each task...