[2506.11128] Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning

[2506.11128] Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2506.11128: Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning

Computer Science > Computation and Language arXiv:2506.11128 (cs) [Submitted on 10 Jun 2025 (v1), last revised 20 Mar 2026 (this version, v3)] Title:Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning Authors:Andrew Keenan Richardson, Ryan Othniel Kearns, Sean Moss, Vincent Wang-Mascianica, Philipp Koralus View a PDF of the paper titled Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning, by Andrew Keenan Richardson and 4 other authors View PDF HTML (experimental) Abstract:We study logical reasoning in language models by asking whether their errors follow established human fallacy patterns. Using the Erotetic Theory of Reasoning (ETR) and its open-source implementation, PyETR, we programmatically generate 383 formally specified reasoning problems and evaluate 38 models. For each response, we judge logical correctness and, when incorrect, whether it matches an ETR-predicted fallacy. Two results stand out: (i) as a capability proxy (Chatbot Arena Elo) increases, a larger share of a model's incorrect answers are ETR-predicted fallacies $(\rho=0.360, p=0.0265)$, while overall correctness on this dataset shows no correlation with capability; (ii) reversing premise order significantly reduces fallacy production for many models, mirroring human order effects. Methodologically, PyETR provides an open-source pipeline for unbounded, synthetic, contamination-resistant reasoning tests linked to a cognitive theory, enabling analyses that fo...

Originally published on March 24, 2026. Curated by AI News.

Related Articles

Llms

We hit 150 stars on our AI setup tool!

yo folks, we just hit 150 stars on our open source tool that auto makes AI context files. got 90 PRs merged and 20 issues that ppl are pi...

Reddit - Artificial Intelligence · 1 min ·
Llms

Is ai getting dummer?

Over the past month, it feels like GPT and Gemini have been giving wrong answers a lot. Do you feel the same, or am I exaggerating? submi...

Reddit - Artificial Intelligence · 1 min ·
Llms

If AI is really making us more productive... why does it feel like we are working more, not less...?

The promise of AI was the ultimate system optimisation: Efficiency. On paper, the tools are delivering something similar to what they pro...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime