[2510.07231] EconCausal: A Context-Aware Causal Reasoning Benchmark for Large Language Models in Social Science

[2510.07231] EconCausal: A Context-Aware Causal Reasoning Benchmark for Large Language Models in Social Science

arXiv - AI 4 min read Article

Summary

EconCausal introduces a benchmark for evaluating causal reasoning in large language models, highlighting their limitations in context-dependent scenarios within social sciences.

Why It Matters

Understanding causal relationships in socio-economic contexts is crucial for informed decision-making. This benchmark reveals significant gaps in current LLMs' capabilities, emphasizing the need for improved models in high-stakes environments where misinterpretation can have serious consequences.

Key Takeaways

  • EconCausal benchmark includes 10,490 context-annotated causal triplets from empirical studies.
  • Current LLMs show a sharp decline in accuracy when faced with context shifts and misinformation.
  • Models struggle with recognizing null effects, achieving only 9.5% accuracy in ambiguous cases.
  • The findings highlight risks in economic decision-making due to misinterpretation of causal relationships.
  • The dataset and benchmark are publicly available for further research and development.

Computer Science > Computation and Language arXiv:2510.07231 (cs) [Submitted on 8 Oct 2025 (v1), last revised 23 Feb 2026 (this version, v3)] Title:EconCausal: A Context-Aware Causal Reasoning Benchmark for Large Language Models in Social Science Authors:Donggyu Lee, Hyeok Yun, Meeyoung Cha, Sungwon Park, Sangyoon Park, Jihee Kim View a PDF of the paper titled EconCausal: A Context-Aware Causal Reasoning Benchmark for Large Language Models in Social Science, by Donggyu Lee and 5 other authors View PDF HTML (experimental) Abstract:Socio-economic causal effects depend heavily on their specific institutional and environmental context. A single intervention can produce opposite results depending on regulatory or market factors, contexts that are often complex and only partially observed. This poses a significant challenge for large language models (LLMs) in decision-support roles: can they distinguish structural causal mechanisms from surface-level correlations when the context changes? To address this, we introduce EconCausal, a large-scale benchmark comprising 10,490 context-annotated causal triplets extracted from 2,595 high-quality empirical studies published in top-tier economics and finance journals. Through a rigorous four-stage pipeline combining multi-run consensus, context refinement, and multi-critic filtering, we ensure each claim is grounded in peer-reviewed research with explicit identification strategies. Our evaluation reveals critical limitations in current LL...

Related Articles

Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
Hackers Are Posting the Claude Code Leak With Bonus Malware | WIRED
Llms

Hackers Are Posting the Claude Code Leak With Bonus Malware | WIRED

Plus: The FBI says a recent hack of its wiretap tools poses a national security risk, attackers stole Cisco source code as part of an ong...

Wired - AI · 9 min ·
Llms

People anxious about deviating from what AI tells them to do?

My friend came over yesterday to dye her hair. She had asked ChatGPT for the 'correct' way to do it. Chat told her to dye the ends first,...

Reddit - Artificial Intelligence · 1 min ·
Llms

ChatGPT on trial: A landmark test of AI liability in the practice of law

AI Tools & Products ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime