[2509.05249] COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization
Summary
COGITAO introduces a novel framework for studying compositionality and generalization in visual reasoning, offering extensive task generation capabilities and insights into current machine learning limitations.
Why It Matters
Understanding compositionality and generalization is crucial for advancing AI capabilities. COGITAO provides a comprehensive tool for researchers to explore these concepts, potentially leading to improved machine learning models that better mimic human reasoning.
Key Takeaways
- COGITAO is a modular framework for generating visual reasoning tasks.
- It allows for the creation of millions of unique task rules, enhancing research opportunities.
- Baseline experiments reveal existing models struggle with generalization despite strong performance in familiar contexts.
- The framework is open-sourced, promoting collaboration and further research.
- Insights from COGITAO could inform future advancements in AI and machine learning.
Computer Science > Computer Vision and Pattern Recognition arXiv:2509.05249 (cs) [Submitted on 5 Sep 2025 (v1), last revised 17 Feb 2026 (this version, v2)] Title:COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization Authors:Yassine Taoudi-Benchekroun, Klim Troyan, Pascal Sager, Stefan Gerber, Lukas Tuggener, Benjamin Grewe View a PDF of the paper titled COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization, by Yassine Taoudi-Benchekroun and 4 other authors View PDF Abstract:The ability to compose learned concepts and apply them in novel settings is key to human intelligence, but remains a persistent limitation in state-of-the-art machine learning models. To address this issue, we introduce COGITAO, a modular and extensible data generation framework and benchmark designed to systematically study compositionality and generalization in visual domains. Drawing inspiration from ARC-AGI's problem-setting, COGITAO constructs rule-based tasks which apply a set of transformations to objects in grid-like environments. It supports composition, at adjustable depth, over a set of 28 interoperable transformations, along with extensive control over grid parametrization and object properties. This flexibility enables the creation of millions of unique task rules -- surpassing concurrent datasets by several orders of magnitude -- across a wide range of difficulties, while allowing virtually unlimited sample generation per rule. We prov...