[2506.12433] Exploring Cultural Variations in Moral Judgments with Large Language Models
About this article
Abstract page for arXiv paper 2506.12433: Exploring Cultural Variations in Moral Judgments with Large Language Models
Computer Science > Computation and Language arXiv:2506.12433 (cs) [Submitted on 14 Jun 2025 (v1), last revised 4 Jan 2026 (this version, v2)] Title:Exploring Cultural Variations in Moral Judgments with Large Language Models Authors:Hadi Mohammadi, Ayoub Bagheri View a PDF of the paper titled Exploring Cultural Variations in Moral Judgments with Large Language Models, by Hadi Mohammadi and 1 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) have shown strong performance across many tasks, but their ability to capture culturally diverse moral values remains unclear. In this paper, we examine whether LLMs mirror variations in moral attitudes reported by the World Values Survey (WVS) and the Pew Research Center's Global Attitudes Survey (PEW). We compare smaller monolingual and multilingual models (GPT-2, OPT, BLOOMZ, and Qwen) with recent instruction-tuned models (GPT-4o, GPT-4o-mini, Gemma-2-9b-it, and Llama-3.3-70B-Instruct). Using log-probability-based \emph{moral justifiability} scores, we correlate each model's outputs with survey data covering a broad set of ethical topics. Our results show that many earlier or smaller models often produce near-zero or negative correlations with human judgments. In contrast, advanced instruction-tuned models achieve substantially higher positive correlations, suggesting they better reflect real-world moral attitudes. We provide a detailed regional analysis revealing that models align better with Western, E...