[2410.05669] ACPBench: Reasoning about Action, Change, and Planning

arXiv - AI March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2410.05669: ACPBench: Reasoning about Action, Change, and Planning

Computer Science > Artificial Intelligence arXiv:2410.05669 (cs) [Submitted on 8 Oct 2024 (v1), last revised 27 Feb 2026 (this version, v3)] Title:ACPBench: Reasoning about Action, Change, and Planning Authors:Harsha Kokel, Michael Katz, Kavitha Srinivas, Shirin Sohrabi View a PDF of the paper titled ACPBench: Reasoning about Action, Change, and Planning, by Harsha Kokel and 3 other authors View PDF HTML (experimental) Abstract:There is an increasing body of work using Large Language Models (LLMs) as agents for orchestrating workflows and making decisions in domains that require planning and multi-step reasoning. As a result, it is imperative to evaluate LLMs on core skills required for planning. In this work, we present ACPBench, a benchmark for evaluating the reasoning tasks in the field of planning. The benchmark consists of 7 reasoning tasks over 13 planning domains. The collection is constructed from planning domains described in a formal language. This allows us to synthesize problems with provably correct solutions across many tasks and domains. Further, it allows us the luxury of scale without additional human effort, i.e., many additional problems can be created automatically. Our extensive evaluation of 22 LLMs and OpenAI o1 reasoning models highlights the significant gap in the reasoning capability of the LLMs. Our findings with OpenAI o1, a multi-turn reasoning model, reveal significant gains in performance on multiple-choice questions, yet surprisingly, no not...

Originally published on March 03, 2026. Curated by AI News.

Llms

8 free AI courses from Anthropic’s Claude platform with certificates

AI News - General · about 1 hour ago

Llms

Claude developer hosts Christian leaders for AI summit

AI Tools & Products · about 2 hours ago

Llms

CoreWeave stock pops 11% on deal to power Anthropic's Claude

AI Tools & Products · 3 min · about 2 hours ago

Llms

I Trained for the Paris Marathon Using ChatGPT

AI Tools & Products · 1 min · about 2 hours ago

[2410.05669] ACPBench: Reasoning about Action, Change, and Planning

About this article

Related Articles

8 free AI courses from Anthropic’s Claude platform with certificates

Claude developer hosts Christian leaders for AI summit

CoreWeave stock pops 11% on deal to power Anthropic's Claude

I Trained for the Paris Marathon Using ChatGPT

No comments

Stay updated with AI News