[2505.19558] PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives

[2505.19558] PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives

arXiv - Machine Learning 4 min read Article

Summary

The paper introduces PoliCon, a benchmark for evaluating large language models (LLMs) in achieving political consensus from diverse party perspectives using European Parliament deliberation records.

Why It Matters

PoliCon addresses the gap in understanding LLMs' capabilities in political contexts, crucial for enhancing governance and decision-making. By evaluating LLMs on their ability to draft consensus resolutions, it provides insights into their biases and effectiveness in real-world political scenarios.

Key Takeaways

  • PoliCon is based on 2,225 deliberation records from the European Parliament.
  • The benchmark evaluates LLMs on their ability to draft resolutions considering political diversity.
  • Results indicate that current LLMs struggle with complex consensus tasks and exhibit partisan biases.
  • The study highlights the need for improved LLM capabilities in political consensus-building.
  • PoliCon's framework can aid future research in AI's role in governance.

Computer Science > Computers and Society arXiv:2505.19558 (cs) [Submitted on 26 May 2025 (v1), last revised 13 Feb 2026 (this version, v3)] Title:PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives Authors:Zhaowei Zhang, Xiaobo Wang, Minghua Yi, Mengmeng Wang, Fengshuo Bai, Zilong Zheng, Yipeng Kang, Yaodong Yang View a PDF of the paper titled PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives, by Zhaowei Zhang and 7 other authors View PDF HTML (experimental) Abstract:Achieving political consensus is crucial yet challenging for the effective functioning of social governance. However, although frontier AI systems represented by large language models (LLMs) have developed rapidly in recent years, their capabilities in this scope are still understudied. In this paper, we introduce PoliCon, a novel benchmark constructed from 2,225 high-quality deliberation records of the European Parliament over 13 years, ranging from 2009 to 2022, to evaluate the ability of LLMs to draft consensus resolutions based on divergent party positions under varying collective decision-making contexts and political requirements. Specifically, PoliCon incorporates four factors to build each task environment for finding different political consensus: specific political issues, political goals, participating parties, and power structures based on seat distribution. We also developed an evaluation framework based on social choice theory for PoliCon, which...

Related Articles

Anthropic Supply-Chain Risk Label Should Stay in Place, Appeals Court Says | WIRED
Llms

Anthropic Supply-Chain Risk Label Should Stay in Place, Appeals Court Says | WIRED

The AI company now faces conflicting rulings in its fight over how Claude can be used by the US military.

Wired - AI · 6 min ·
Tubi is the first streamer to launch a native app within ChatGPT | TechCrunch
Llms

Tubi is the first streamer to launch a native app within ChatGPT | TechCrunch

Tubi becomes the first streaming service to offer an app integration within ChatGPT, the AI chatbot that millions of users turn to for an...

TechCrunch - AI · 3 min ·
Llms

Anyone out there use Claude Pro/Max at the same time on different screens?

I am asking for feedback ? I’m currently using a Claude paid plan (Pro/Max) and was wondering about the logistics of simultaneous use. Sp...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] The Lyra Technique — A framework for interpreting internal cognitive states in LLMs (Zenodo, open access)

We're releasing a paper on a new framework for reading and interpreting the internal cognitive states of large language models: "The Lyra...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime