[2602.17283] Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective
Summary
This article presents X-Value, a new benchmark for assessing cross-lingual values in large language models (LLMs), highlighting their limitations in nuanced content evaluation.
Why It Matters
As LLMs play a crucial role in content safety, understanding their ability to assess deeper values across languages is vital. This research addresses a significant gap in current evaluation paradigms, promoting a more comprehensive approach to content assessment that considers cultural and ethical dimensions.
Key Takeaways
- Introduction of X-Value, a benchmark for cross-lingual values assessment.
- Current LLMs show performance gaps in understanding nuanced values, with accuracy below 77%.
- The study emphasizes the need for improved values-aware content assessment in AI.
Computer Science > Computation and Language arXiv:2602.17283 (cs) [Submitted on 19 Feb 2026] Title:Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective Authors:Yukun Chen, Xinyu Zhang, Jialong Tang, Yu Wan, Baosong Yang, Yiming Li, Zhan Qin, Kui Ren View a PDF of the paper titled Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective, by Yukun Chen and 7 other authors View PDF HTML (experimental) Abstract:While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the subtler value dimensions conveyed in digital content. To bridge this gap, we introduce X-Value, a novel Cross-lingual Values Assessment Benchmark designed to evaluate LLMs' ability to assess deep-level values of content from a global perspective. X-Value consists of more than 5,000 QA pairs across 18 languages, systematically organized into 7 core domains grounded in Schwartz's Theory of Basic Human Values and categorized into easy and hard levels for discriminative evaluation. We further propose a unique two-stage annotation framework that first identifies whether an issue falls under global consensus (e.g., human rights) or pluralism (e.g., religion), and subsequently conducts a multi-party evaluation of the latent values embedded within the content. Systematic evaluations on X-Value reveal that current SOTA LLMs exhibit deficiencies in ...