[2602.22831] Moral Preferences of LLMs Under Directed Contextual Influence

[2602.22831] Moral Preferences of LLMs Under Directed Contextual Influence

arXiv - AI 4 min read Article

Summary

This paper explores how contextual influences affect the moral decision-making of large language models (LLMs) in scenarios akin to trolley problems, revealing significant shifts in outcomes based on contextual cues.

Why It Matters

Understanding how LLMs respond to contextual influences is crucial for developing ethical AI systems. This research highlights the complexities of moral decision-making in AI, emphasizing the need for improved evaluation methods that account for contextual factors.

Key Takeaways

  • Contextual influences can significantly alter LLM decisions, even with superficial relevance.
  • Baseline preferences do not reliably predict how models will respond to contextual cues.
  • Models may exhibit unexpected decision shifts, sometimes counter to their stated neutrality.
  • Incorporating reasoning can reduce sensitivity to context but amplify biases from few-shot examples.
  • Controlled context manipulations are necessary for more accurate moral evaluations of LLMs.

Computer Science > Machine Learning arXiv:2602.22831 (cs) [Submitted on 26 Feb 2026] Title:Moral Preferences of LLMs Under Directed Contextual Influence Authors:Phil Blandfort, Tushar Karayil, Urja Pawar, Robert Graham, Alex McKenzie, Dmitrii Krasheninnikov View a PDF of the paper titled Moral Preferences of LLMs Under Directed Contextual Influence, by Phil Blandfort and 5 other authors View PDF HTML (experimental) Abstract:Moral benchmarks for LLMs typically use context-free prompts, implicitly assuming stable preferences. In deployment, however, prompts routinely include contextual signals such as user requests, cues on social norms, etc. that may steer decisions. We study how directed contextual influences reshape decisions in trolley-problem-style moral triage settings. We introduce a pilot evaluation harness for directed contextual influence in trolley-problem-style moral triage: for each demographic factor, we apply matched, direction-flipped contextual influences that differ only in which group they favor, enabling systematic measurement of directional response. We find that: (i) contextual influences often significantly shift decisions, even when only superficially relevant; (ii) baseline preferences are a poor predictor of directional steerability, as models can appear baseline-neutral yet exhibit systematic steerability asymmetry under influence; (iii) influences can backfire: models may explicitly claim neutrality or discount the contextual cue, yet their choice...

Related Articles

Llms

Artificial intelligence will always depends on human otherwise it will be obsolete.

I was looking for a tool for my specific need. There was not any. So i started to write the program in python, just basic structure. Then...

Reddit - Artificial Intelligence · 1 min ·
Llms

My AI spent last night modifying its own codebase

I've been working on a local AI system called Apis that runs completely offline through Ollama. During a background run, Apis identified ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Fake users generated by AI can't simulate humans — review of 182 research papers. Your thoughts?

https://www.researchsquare.com/article/rs-9057643/v1 There’s a massive trend right now where tech companies, businesses, even researchers...

Reddit - Artificial Intelligence · 1 min ·
Llms

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

TL;DR: Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — a...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime