[2511.11079] ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving
Summary
ARCTraj introduces a dataset and framework for modeling human reasoning in abstract problem-solving, providing insights into the iterative nature of reasoning through visual tasks.
Why It Matters
This research addresses the limitations of existing datasets by capturing dynamic reasoning processes, which is crucial for advancing AI's interpretability and alignment with human-like reasoning. The findings can significantly impact the development of more robust AI systems that better understand and replicate human cognitive processes.
Key Takeaways
- ARCTraj captures temporally ordered human reasoning actions.
- The dataset includes around 10,000 trajectories across 400 tasks.
- It enables integration with various AI modeling techniques.
- The framework enhances explainability and generalizable intelligence.
- Analyses reveal insights into spatial selection and strategic convergence.
Computer Science > Artificial Intelligence arXiv:2511.11079 (cs) [Submitted on 14 Nov 2025 (v1), last revised 15 Feb 2026 (this version, v3)] Title:ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving Authors:Sejin Kim, Hayan Choi, Seokki Lee, Sundong Kim View a PDF of the paper titled ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving, by Sejin Kim and 3 other authors View PDF HTML (experimental) Abstract:We present ARCTraj, a dataset and methodological framework for modeling human reasoning through complex visual tasks in the Abstraction and Reasoning Corpus (ARC). While ARC has inspired extensive research on abstract reasoning, most existing approaches rely on static input-output supervision, which limits insight into how reasoning unfolds over time. ARCTraj addresses this gap by recording temporally ordered, object-level actions that capture how humans iteratively transform inputs into outputs, revealing intermediate reasoning steps that conventional datasets overlook. Collected via the O2ARC web interface, it contains around 10,000 trajectories annotated with task identifiers, timestamps, and success labels across 400 training tasks from the ARC-AGI-1 benchmark. It further defines a unified reasoning pipeline encompassing data collection, action abstraction, Markov decision process (MDP) formulation, and downstream learning, enabling integration with reinforcement learning, generative ...