[2603.27338] CounterMoral: Editing Morals in Language Models

arXiv - AI March 31, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.27338: CounterMoral: Editing Morals in Language Models

Computer Science > Artificial Intelligence arXiv:2603.27338 (cs) [Submitted on 28 Mar 2026] Title:CounterMoral: Editing Morals in Language Models Authors:Michael Ripa, Jim Davies View a PDF of the paper titled CounterMoral: Editing Morals in Language Models, by Michael Ripa and 1 other authors View PDF HTML (experimental) Abstract:Recent advancements in language model technology have significantly enhanced the ability to edit factual information. Yet, the modification of moral judgments, a crucial aspect of aligning models with human values, has garnered less attention. In this work, we introduce CounterMoral, a benchmark dataset crafted to assess how well current model editing techniques modify moral judgments across diverse ethical frameworks. We apply various editing techniques to multiple language models and evaluate their performance. Our findings contribute to the evaluation of language models designed to be ethical. Comments: Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2603.27338 [cs.AI] (or arXiv:2603.27338v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2603.27338 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Michael Ripa [view email] [v1] Sat, 28 Mar 2026 17:13:30 UTC (192 KB) Full-text links: Access Paper: View a PDF of the paper titled CounterMoral: Editing Morals in Language Models, by Michael Ripa and 1 other authorsView PDFHTML (experimental)TeX Source view license Current browse...

Originally published on March 31, 2026. Curated by AI News.

Llms

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an after...

Reddit - Machine Learning · 1 min · 38 minutes ago

Llms

I have been coding for 11 years and I caught myself completely unable to debug a problem without AI assistance last month. That scared me more than anything I have seen in this industry.

I want to be honest about something that happened to me because I think it is more common than people admit. Last month I hit a bug in a ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

OpenClaw security checklist: practical safeguards for AI agents

Here is one of the better quality guides on the ensuring safety when deploying OpenClaw: https://chatgptguide.ai/openclaw-security-checkl...

Reddit - Artificial Intelligence · 1 min · about 8 hours ago

Llms

I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge

Gemini in Google Maps is a surprisingly useful way to explore new territory.

The Verge - AI · 11 min · about 9 hours ago

[2603.27338] CounterMoral: Editing Morals in Language Models

About this article

Related Articles

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

I have been coding for 11 years and I caught myself completely unable to debug a problem without AI assistance last month. That scared me more than anything I have seen in this industry.

OpenClaw security checklist: practical safeguards for AI agents

I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge

No comments

Stay updated with AI News