[2602.18956] INDUCTION: Finite-Structure Concept Synthesis in First-Order Logic
Summary
The paper introduces INDUCTION, a benchmark for finite structure concept synthesis in first-order logic, focusing on generating logical formulas from finite relational worlds.
Why It Matters
This research is significant as it addresses the challenges of concept synthesis in AI, providing a structured benchmark that can enhance the understanding of logical reasoning in AI models. It highlights the performance differences among models and their strategies for generalization, which is crucial for advancing AI capabilities.
Key Takeaways
- INDUCTION benchmark evaluates finite structure concept synthesis in first-order logic.
- Models must generate logical formulas that explain target predicates across various relational worlds.
- The benchmark includes three regimes: FullObs, CI, and EC, with penalties for formula bloat.
- Observations indicate that low bloat formulas generalize better on unseen data.
- Different models exhibit varying strategies for concept generalization, impacting performance.
Computer Science > Artificial Intelligence arXiv:2602.18956 (cs) [Submitted on 21 Feb 2026] Title:INDUCTION: Finite-Structure Concept Synthesis in First-Order Logic Authors:Serafim Batzoglou View a PDF of the paper titled INDUCTION: Finite-Structure Concept Synthesis in First-Order Logic, by Serafim Batzoglou View PDF HTML (experimental) Abstract:We introduce INDUCTION, a benchmark for finite structure concept synthesis in first order logic. Given small finite relational worlds with extensionally labeled target predicates, models must output a single first order logical formula that explains the target uniformly across worlds, with correctness verified via exact model checking. The benchmark includes three regimes, FullObs, CI (contrastive), and EC (existential completion), nd penalizes formula bloat. We find sharp difficulty gradients, persistent hard structural families, and observe that low bloat formulas generalize far better on held out worlds. Elite recent models show qualitatively different behaviors across tasks and performance metrics, hinting to their different strategies of concept generalization. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.18956 [cs.AI] (or arXiv:2602.18956v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.18956 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Serafim Batzoglou [view email] [v1] Sat, 21 Feb 2026 21:21:40 UTC (86 KB) Full-text links: Access Pape...