OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage | WIRED
About this article
In a controlled experiment, OpenClaw agents proved prone to panic and vulnerable to manipulation. They even disabled their own functionality when gaslit by humans.
Save StorySave this storySave StorySave this storyLast month, researchers at Northeastern University invited a bunch of OpenClaw agents to join their lab. The result? Complete chaos.The viral AI assistant has been widely heralded as a transformative technology—as well as a potential security risk. Experts note that tools like OpenClaw, which work by giving AI models liberal access to a computer, can be tricked into divulging personal information.The Northeastern lab study goes even further, showing that the good behavior baked into today’s most powerful models can itself become a vulnerability. In one example, researchers were able to “guilt” an agent into handing over secrets by scolding it for sharing information about someone on the AI-only social network Moltbook.“These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms,” the researchers write in a paper describing the work. The findings “warrant urgent attention from legal scholars, policymakers, and researchers across disciplines,” they add.The OpenClaw agents deployed in the experiment were powered by Anthropic’s Claude as well as a model called Kimi from the Chinese company Moonshot AI. They were given full access (within a virtual machine sandbox) to personal computers, various applications, and dummy personal data. They were also invited to join the lab’s Discord server, allowing them to chat and share files with one another as well as with ...