Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

TechCrunch - AI 3 min read

About this article

Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic.

In Brief Posted: 1:40 PM PDT · May 10, 2026 Image Credits:Samuel Boivin/NurPhoto / Getty Images Anthony Ha Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic. Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with “agentic misalignment.” Apparently Anthropic has done more work around that behavior, claiming in a post on X, “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.” The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.” What accounts for the difference? The company said it found that “documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment.” Related, Anthropic said that it found training to be more effective when it includes “the principles underlying aligned behavior” and not just “demonstrations of aligned behavior alone.” “Doing both together appears to be the most effective strategy,” the company said. Te...

Originally published on May 11, 2026. Curated by AI News.

Related Articles

Researchers asked ChatGPT, Gemini and Claude which jobs are most exposed to AI. The chatbots wildly diagree
Llms

Researchers asked ChatGPT, Gemini and Claude which jobs are most exposed to AI. The chatbots wildly diagree

A study reveals that AI models disagree on which jobs are most vulnerable to automation, highlighting the unreliability of AI-generated e...

AI Tools & Products · 4 min ·
I stopped treating ChatGPT like Google — and everything suddenly clicked
Llms

I stopped treating ChatGPT like Google — and everything suddenly clicked

I stopped using ChatGPT like Google and started treating it like a thinking partner — here’s why that simple shift made the AI dramatical...

AI Tools & Products · 8 min ·
Hackers abuse Google ads, Claude.ai chats to push Mac malware
Llms

Hackers abuse Google ads, Claude.ai chats to push Mac malware

AI Tools & Products · 6 min ·
Llms

Does Claude dream of electric gavels? A federal case with Kansas connections sets an AI precedent.

AI Tools & Products ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime