[2603.00206] TACIT Benchmark: A Programmatic Visual Reasoning Benchmark for Generative and Discriminative Models
About this article
Abstract page for arXiv paper 2603.00206: TACIT Benchmark: A Programmatic Visual Reasoning Benchmark for Generative and Discriminative Models
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.00206 (cs) [Submitted on 27 Feb 2026] Title:TACIT Benchmark: A Programmatic Visual Reasoning Benchmark for Generative and Discriminative Models Authors:Daniel Nobrega Medeiros View a PDF of the paper titled TACIT Benchmark: A Programmatic Visual Reasoning Benchmark for Generative and Discriminative Models, by Daniel Nobrega Medeiros View PDF HTML (experimental) Abstract:Existing visual reasoning benchmarks predominantly rely on natural language prompts, evaluate narrow reasoning modalities, or depend on subjective scoring procedures such as LLM-as-judge. We introduce the TACIT Benchmark, a programmatic visual reasoning benchmark comprising 10 tasks across 6 reasoning domains: spatial navigation, abstract pattern completion, causal simulation, logical constraint satisfaction, graph theory, and topology. The benchmark provides dual-track evaluation: a generative track in which models must produce solution images verified through deterministic computer-vision pipelines, and a discriminative track offering five-way multiple choice with structurally plausible near-miss distractors. Each distractor violates exactly one structural constraint, requiring models to reason about fine-grained visual differences rather than exploit superficial cues. Version 0.1.0 distributes 6,000 puzzles (108,000 PNG images across three resolutions) with fully deterministic seeded generation and reproducible verification. The datase...