Llms Machine Learning Ai Startups Ai Safety Generative Ai

[2602.21226] IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions

arXiv - AI February 26, 2026 4 min read Article

Summary

The paper introduces IslamicLegalBench, a benchmark for evaluating LLMs' reasoning on Islamic law, revealing significant limitations in current AI models' performance.

Why It Matters

As AI systems increasingly provide religious guidance, understanding their limitations in reasoning about complex legal traditions is crucial. This study highlights the inadequacies of current models, emphasizing the need for improved AI frameworks in sensitive domains like Islamic jurisprudence.

Key Takeaways

IslamicLegalBench evaluates LLMs across seven schools of Islamic jurisprudence.
The best-performing model achieved only 68% correctness, with significant hallucination rates.
Few-shot prompting showed minimal improvements in model performance.
Moderate-complexity tasks revealed the highest error rates, indicating gaps in foundational knowledge.
The study underscores the risks of relying on AI for spiritual guidance without robust evaluation frameworks.

Computer Science > Computation and Language arXiv:2602.21226 (cs) [Submitted on 2 Feb 2026] Title:IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions Authors:Ezieddin Elmahjub, Junaid Qadir, Abdullah Mushtaq, Rafay Naeem, Ibrahim Ghaznavi, Waleed Iqbal View a PDF of the paper titled IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions, by Ezieddin Elmahjub and 5 other authors View PDF HTML (experimental) Abstract:As millions of Muslims turn to LLMs like GPT, Claude, and DeepSeek for religious guidance, a critical question arises: Can these AI systems reliably reason about Islamic law? We introduce IslamicLegalBench, the first benchmark evaluating LLMs across seven schools of Islamic jurisprudence, with 718 instances covering 13 tasks of varying complexity. Evaluation of nine state-of-the-art models reveals major limitations: the best model achieves only 68% correctness with 21% hallucination, while several models fall below 35% correctness and exceed 55% hallucination. Few-shot prompting provides minimal gains, improving only 2 of 9 models by >1%. Moderate-complexity tasks requiring exact knowledge show the highest errors, whereas high-complexity tasks display apparent competence through semantic reasoning. False premise detection indicates risky sycophancy, with 6 of 9 models accepting misleading assumptions at rates ...

Read Original Article

[2602.21226] IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions

Summary

Why It Matters

Key Takeaways

Related Articles

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

Claude Code leak exposes a Tamagotchi-style ‘pet’ and an always-on agent | The Verge

You can now use ChatGPT with Apple’s CarPlay | The Verge

No comments

Stay updated with AI News