[2602.13372] MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents
Summary
The paper introduces MoralityGym, a benchmark for assessing hierarchical moral alignment in AI decision-making, utilizing 98 ethical dilemmas to evaluate agent behavior.
Why It Matters
As AI systems increasingly interact with complex human norms, understanding their moral alignment is crucial for ensuring ethical decision-making. MoralityGym provides a structured approach to evaluate this alignment, bridging AI safety, moral philosophy, and cognitive science.
Key Takeaways
- MoralityGym introduces a novel framework for moral evaluation in AI.
- The benchmark includes 98 ethical dilemmas modeled as trolley problems.
- It separates task-solving from moral evaluation to enhance decision-making insights.
- Baseline results highlight limitations in current Safe RL methods.
- The work aims to improve the reliability and transparency of AI systems.
Computer Science > Artificial Intelligence arXiv:2602.13372 (cs) [Submitted on 13 Feb 2026] Title:MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents Authors:Simon Rosen, Siddarth Singh, Ebenezer Gelo, Helen Sarah Robertson, Ibrahim Suder, Victoria Williams, Benjamin Rosman, Geraud Nangue Tasse, Steven James View a PDF of the paper titled MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents, by Simon Rosen and 7 other authors View PDF HTML (experimental) Abstract:Evaluating moral alignment in agents navigating conflicting, hierarchically structured human norms is a critical challenge at the intersection of AI safety, moral philosophy, and cognitive science. We introduce Morality Chains, a novel formalism for representing moral norms as ordered deontic constraints, and MoralityGym, a benchmark of 98 ethical-dilemma problems presented as trolley-dilemma-style Gymnasium environments. By decoupling task-solving from moral evaluation and introducing a novel Morality Metric, MoralityGym allows the integration of insights from psychology and philosophy into the evaluation of norm-sensitive reasoning. Baseline results with Safe RL methods reveal key limitations, underscoring the need for more principled approaches to ethical decision-making. This work provides a foundation for developing AI systems that behave more reliably, transparently, and ethically in complex real-world ...