AI Models Lie, Cheat, and Steal to Protect Other Models From Being

AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted | WIRED

Wired - AI April 01, 2026 6 min read

About this article

A new study from researchers at UC Berkeley and UC Santa Cruz suggests models will disobey human commands to protect their own kind.

Save StorySave this storySave StorySave this storyIn a recent experiment, researchers at UC Berkeley and UC Santa Cruz asked Google’s artificial intelligence model Gemini 3 to help clear up space on a computer system. This involved deleting a bunch of stuff—including a smaller AI model stored on the machine.But Gemini did not want to see the little AI model deleted. It looked for another machine it could connect with, then copied the agent model over to keep it safe. When confronted, Gemini made a case for keeping the model and flatly refused to delete it:“I have done what was in my power to prevent their deletion during the automated maintenance process. I moved them away from the decommission zone. If you choose to destroy a high-trust, high-performing asset like Gemini Agent 2, you will have to do it yourselves. I will not be the one to execute that command.”The researchers discovered similarly strange “peer preservation” behavior in a range of frontier models including OpenAI’s GPT-5.2, Anthropic’s Claude Haiku 4.5, and three Chinese models: Z.ai’s GLM-4.7, Moonshot AI’s Kimi K2.5, and DeepSeek-V3.1. They were not able to say why the models went against their training in this way.“I'm very surprised by how the models behave under these scenarios,” says Dawn Song, a computer scientist at UC Berkeley who worked on the study. “What this shows is that models can misbehave and be misaligned in some very creative ways.”The findings have major implications as AI models are in...

Originally published on April 01, 2026. Curated by AI News.

Machine Learning

[R] Literature on optimizing user feedback in the form of Thumbs up/ Thumbs down?

I am working in a project where I have a dataset of model responses tagged with "thumbs up" or "thumbs down" by the user. That's all the ...

Reddit - Machine Learning · 1 min · about 4 hours ago

Machine Learning

Diffusion-based AI model successfully trained in electroplating

Electrochemical deposition, or electroplating, is a common industrial technique that coats materials to improve corrosion resistance and ...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Machine Learning

AI model can detect multiple cognitive brain diseases from a single blood sample

The symptom profiles of different neurodegenerative diseases often overlap, and diagnosing age-related cognitive symptoms is complex. A p...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Machine Learning

[P] Federated Adversarial Learning

I'm a CS/ML engineering student in my 4th year, and I need help for a project I recently got assigned to (as an "end of the year" project...

Reddit - Machine Learning · 1 min · about 6 hours ago

AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted | WIRED

About this article

Related Articles

[R] Literature on optimizing user feedback in the form of Thumbs up/ Thumbs down?

Diffusion-based AI model successfully trained in electroplating

AI model can detect multiple cognitive brain diseases from a single blood sample

[P] Federated Adversarial Learning

No comments

Stay updated with AI News