[2509.25369] Generative Value Conflicts Reveal LLM Priorities

[2509.25369] Generative Value Conflicts Reveal LLM Priorities

arXiv - Machine Learning 4 min read Article

Summary

This paper introduces ConflictScope, a tool for evaluating how large language models (LLMs) prioritize conflicting values, revealing insights into their behavior under value conflicts.

Why It Matters

Understanding how LLMs navigate value conflicts is crucial for improving their alignment with human values. This research highlights the need for better evaluation methods and offers a foundation for future studies in AI alignment, which is essential for responsible AI deployment.

Key Takeaways

  • ConflictScope evaluates LLMs' prioritization of values in conflict scenarios.
  • Models tend to favor personal values over protective values in open-ended settings.
  • Detailed value orderings in prompts can improve model alignment by 14%.

Computer Science > Computation and Language arXiv:2509.25369 (cs) [Submitted on 29 Sep 2025 (v1), last revised 25 Feb 2026 (this version, v2)] Title:Generative Value Conflicts Reveal LLM Priorities Authors:Andy Liu, Kshitish Ghate, Mona Diab, Daniel Fried, Atoosa Kasirzadeh, Max Kleiman-Weiner View a PDF of the paper titled Generative Value Conflicts Reveal LLM Priorities, by Andy Liu and 5 other authors View PDF HTML (experimental) Abstract:Past work seeks to align large language model (LLM)-based assistants with a target set of values, but such assistants are frequently forced to make tradeoffs between values when deployed. In response to the scarcity of value conflict in existing alignment datasets, we introduce ConflictScope, an automatic pipeline to evaluate how LLMs prioritize different values. Given a user-defined value set, ConflictScope automatically generates scenarios in which a language model faces a conflict between two values sampled from the set. It then prompts target models with an LLM-written "user prompt" and evaluates their free-text responses to elicit a ranking over values in the value set. Comparing results between multiple-choice and open-ended evaluations, we find that models shift away from supporting protective values, such as harmlessness, and toward supporting personal values, such as user autonomy, in more open-ended value conflict settings. However, including detailed value orderings in models' system prompts improves alignment with a target ...

Related Articles

Llms

World models will be the next big thing, bye-bye LLMs

Was at Nvidia's GTC conference recently and honestly, it was one of the most eye-opening events I've attended in a while. There was a lot...

Reddit - Artificial Intelligence · 1 min ·
Llms

we open sourced a tool that auto generates your AI agent context from your actual codebase, just hit 250 stars

hey everyone. been lurking here for a while and wanted to share something we been building. the problem: ai coding agents are only as goo...

Reddit - Artificial Intelligence · 1 min ·
Llms

I Accidentally Discovered a Security Vulnerability in AI Education — Then Submitted It To a $200K Competition

Last night I was testing Maestro University, the first fully AI-taught university. I walked into their enrollment chatbot and asked it to...

Reddit - Artificial Intelligence · 1 min ·
Llms

Is anyone else concerned with this blatant potential of security / privacy breach?

Recently, when sending a very sensitive email to my brother including my mother’s health information, I wondered what happens if a recipi...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime