[2602.18843] ABD: Default Exception Abduction in Finite First Order Worlds

[2602.18843] ABD: Default Exception Abduction in Finite First Order Worlds

arXiv - AI 3 min read Article

Summary

The paper introduces ABD, a benchmark for default-exception abduction in finite first-order worlds, evaluating LLMs on their ability to define exceptions while maintaining sparsity.

Why It Matters

This research is significant as it addresses the challenges of exception handling in AI models, particularly in finite first-order logic contexts. By formalizing observation regimes and evaluating state-of-the-art LLMs, it contributes to the understanding of model limitations and generalization failures, which is crucial for advancing AI reliability and performance.

Key Takeaways

  • ABD serves as a benchmark for evaluating exception abduction in AI.
  • The study formalizes three observation regimes: closed-world, existential completion, and universal completion.
  • Evaluation of ten LLMs reveals high validity but notable gaps in parsimony.
  • Distinct generalization failure modes were identified across different regimes.
  • The findings highlight the need for improved models in handling exceptions effectively.

Computer Science > Artificial Intelligence arXiv:2602.18843 (cs) [Submitted on 21 Feb 2026] Title:ABD: Default Exception Abduction in Finite First Order Worlds Authors:Serafim Batzoglou View a PDF of the paper titled ABD: Default Exception Abduction in Finite First Order Worlds, by Serafim Batzoglou View PDF HTML (experimental) Abstract:We introduce ABD, a benchmark for default-exception abduction over finite first-order worlds. Given a background theory with an abnormality predicate and a set of relational structures, a model must output a first-order formula that defines exceptions, restoring satisfiability while keeping exceptions sparse. We formalize three observation regimes (closed-world, existential completion, universal completion) with exact SMT verification. Evaluating ten frontier LLMs on 600 instances, the best models achieve high validity but parsimony gaps remain, and holdout evaluation reveals distinct generalization failure modes across regimes. Subjects: Artificial Intelligence (cs.AI); Symbolic Computation (cs.SC) Cite as: arXiv:2602.18843 [cs.AI]   (or arXiv:2602.18843v1 [cs.AI] for this version)   https://doi.org/10.48550/arXiv.2602.18843 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Serafim Batzoglou [view email] [v1] Sat, 21 Feb 2026 14:14:35 UTC (57 KB) Full-text links: Access Paper: View a PDF of the paper titled ABD: Default Exception Abduction in Finite First Order Worlds, by Serafim BatzoglouVie...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Llms

wtf bro did what? arc 3 2026

The Physarum Explorer is a high-speed, bio-inspired neural model designed specifically for ARC geometry. Here is the snapshot of its curr...

Reddit - Artificial Intelligence · 1 min ·
Meta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at Risk | WIRED
Machine Learning

Meta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at Risk | WIRED

Major AI labs are investigating a security incident that impacted Mercor, a leading data vendor. The incident could have exposed key data...

Wired - AI · 6 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime