[2511.11079] ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving

[2511.11079] ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving

arXiv - AI 4 min read Article

Summary

ARCTraj introduces a dataset and framework for modeling human reasoning in abstract problem-solving, providing insights into the iterative nature of reasoning through visual tasks.

Why It Matters

This research addresses the limitations of existing datasets by capturing dynamic reasoning processes, which is crucial for advancing AI's interpretability and alignment with human-like reasoning. The findings can significantly impact the development of more robust AI systems that better understand and replicate human cognitive processes.

Key Takeaways

  • ARCTraj captures temporally ordered human reasoning actions.
  • The dataset includes around 10,000 trajectories across 400 tasks.
  • It enables integration with various AI modeling techniques.
  • The framework enhances explainability and generalizable intelligence.
  • Analyses reveal insights into spatial selection and strategic convergence.

Computer Science > Artificial Intelligence arXiv:2511.11079 (cs) [Submitted on 14 Nov 2025 (v1), last revised 15 Feb 2026 (this version, v3)] Title:ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving Authors:Sejin Kim, Hayan Choi, Seokki Lee, Sundong Kim View a PDF of the paper titled ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving, by Sejin Kim and 3 other authors View PDF HTML (experimental) Abstract:We present ARCTraj, a dataset and methodological framework for modeling human reasoning through complex visual tasks in the Abstraction and Reasoning Corpus (ARC). While ARC has inspired extensive research on abstract reasoning, most existing approaches rely on static input-output supervision, which limits insight into how reasoning unfolds over time. ARCTraj addresses this gap by recording temporally ordered, object-level actions that capture how humans iteratively transform inputs into outputs, revealing intermediate reasoning steps that conventional datasets overlook. Collected via the O2ARC web interface, it contains around 10,000 trajectories annotated with task identifiers, timestamps, and success labels across 400 training tasks from the ARC-AGI-1 benchmark. It further defines a unified reasoning pipeline encompassing data collection, action abstraction, Markov decision process (MDP) formulation, and downstream learning, enabling integration with reinforcement learning, generative ...

Related Articles

Machine Learning

Anyone compared Gemma 4 31B

I have been seeing a lot of people claiming how good Gemma 4 31B model is. I know when compared to the size of models like sonnet which i...

Reddit - Artificial Intelligence · 1 min ·
Google’s Gemini AI can answer your questions with 3D models and simulations
Llms

Google’s Gemini AI can answer your questions with 3D models and simulations

Google's latest upgrade for Gemini will allow the chatbot to generate interactive 3D models and simulations in response to your questions...

The Verge - AI · 4 min ·
The fear over Anthropic’s new AI model Mythos
Machine Learning

The fear over Anthropic’s new AI model Mythos

AI Tools & Products · 5 min ·
The Gemini app can now generate interactive simulations and models.
Llms

The Gemini app can now generate interactive simulations and models.

AI Tools & Products · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime