[2505.22318] Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds
About this article
Abstract page for arXiv paper 2505.22318: Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds
Computer Science > Computation and Language arXiv:2505.22318 (cs) [Submitted on 28 May 2025 (v1), last revised 24 Mar 2026 (this version, v2)] Title:Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds Authors:Anish R Joishy, Ishwar B Balappanawar, Vamshi Krishna Bonagiri, Manas Gaur, Krishnaprasad Thirunarayan, Ponnurangam Kumaraguru View a PDF of the paper titled Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds, by Anish R Joishy and 5 other authors View PDF HTML (experimental) Abstract:A fundamental challenge in reasoning is navigating hypothetical, counterfactual worlds where logic may conflict with ingrained knowledge. We investigate this frontier for Large Language Models (LLMs) by asking: Can LLMs reason logically when the context contradicts their parametric knowledge? To facilitate a systematic analysis, we first introduce CounterLogic, a benchmark specifically designed to disentangle logical validity from knowledge alignment. Evaluation of 11 LLMs across six diverse reasoning datasets reveals a consistent failure: model accuracy plummets by an average of 14% in counterfactual scenarios compared to knowledge-aligned ones. We hypothesize that this gap stems not from a flaw in logical processing, but from an inability to manage the cognitive conflict between context and knowledge. Inspired by human metacognition, we propose a simple yet powerful intervention: Flag & Reason (FaR), where models are first prompted...