[2602.23546] Humans and LLMs Diverge on Probabilistic Inferences
About this article
Abstract page for arXiv paper 2602.23546: Humans and LLMs Diverge on Probabilistic Inferences
Computer Science > Computation and Language arXiv:2602.23546 (cs) [Submitted on 26 Feb 2026] Title:Humans and LLMs Diverge on Probabilistic Inferences Authors:Gaurav Kamath, Sreenath Madathil, Sebastian Schuster, Marie-Catherine de Marneffe, Siva Reddy View a PDF of the paper titled Humans and LLMs Diverge on Probabilistic Inferences, by Gaurav Kamath and 4 other authors View PDF HTML (experimental) Abstract:Human reasoning often involves working over limited information to arrive at probabilistic conclusions. In its simplest form, this involves making an inference that is not strictly entailed by a premise, but rather only likely given the premise. While reasoning LLMs have demonstrated strong performance on logical and mathematical tasks, their behavior on such open-ended, non-deterministic inferences remains largely unexplored. We introduce ProbCOPA, a dataset of 210 handcrafted probabilistic inferences in English, each annotated for inference likelihood by 25--30 human participants. We find that human responses are graded and varied, revealing probabilistic judgments of the inferences in our dataset. Comparing these judgments with responses from eight state-of-the-art reasoning LLMs, we show that models consistently fail to produce human-like distributions. Finally, analyzing LLM reasoning chains, we find evidence of a common reasoning pattern used to evaluate such inferences. Our findings reveal persistent differences between humans and LLMs, and underscore the need t...