[2603.27338] CounterMoral: Editing Morals in Language Models
About this article
Abstract page for arXiv paper 2603.27338: CounterMoral: Editing Morals in Language Models
Computer Science > Artificial Intelligence arXiv:2603.27338 (cs) [Submitted on 28 Mar 2026] Title:CounterMoral: Editing Morals in Language Models Authors:Michael Ripa, Jim Davies View a PDF of the paper titled CounterMoral: Editing Morals in Language Models, by Michael Ripa and 1 other authors View PDF HTML (experimental) Abstract:Recent advancements in language model technology have significantly enhanced the ability to edit factual information. Yet, the modification of moral judgments, a crucial aspect of aligning models with human values, has garnered less attention. In this work, we introduce CounterMoral, a benchmark dataset crafted to assess how well current model editing techniques modify moral judgments across diverse ethical frameworks. We apply various editing techniques to multiple language models and evaluate their performance. Our findings contribute to the evaluation of language models designed to be ethical. Comments: Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2603.27338 [cs.AI] (or arXiv:2603.27338v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2603.27338 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Michael Ripa [view email] [v1] Sat, 28 Mar 2026 17:13:30 UTC (192 KB) Full-text links: Access Paper: View a PDF of the paper titled CounterMoral: Editing Morals in Language Models, by Michael Ripa and 1 other authorsView PDFHTML (experimental)TeX Source view license Current browse...