Ai Agents Ai Startups Ai Safety Robotics

[2602.14364] A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

arXiv - AI February 17, 2026 3 min read Article

Summary

This article presents a trajectory-based safety audit of Clawdbot, an AI agent, evaluating its performance across various risk dimensions and identifying vulnerabilities in its safety protocols.

Why It Matters

As AI agents like Clawdbot become more integrated into workflows, understanding their safety profiles is crucial. This audit highlights potential risks, informing developers and users about the importance of robust safety measures in AI systems, especially in ambiguous scenarios.

Key Takeaways

Clawdbot's safety audit reveals a non-uniform safety profile across different tasks.
Most failures occur under underspecified intent and open-ended goals, emphasizing the need for clear directives.
The study utilizes both automated and human reviews to assess safety, providing a comprehensive evaluation.
Common vulnerabilities include misinterpretations that can lead to significant tool actions.
Case studies illustrate typical failure modes, offering insights for future AI safety improvements.

Computer Science > Cryptography and Security arXiv:2602.14364 (cs) [Submitted on 16 Feb 2026] Title:A Trajectory-Based Safety Audit of Clawdbot (OpenClaw) Authors:Tianyu Chen, Dongrui Liu, Xia Hu, Jingyi Yu, Wenjie Wang View a PDF of the paper titled A Trajectory-Based Safety Audit of Clawdbot (OpenClaw), by Tianyu Chen and 4 other authors View PDF Abstract:Clawdbot is a self-hosted, tool-using personal AI agent with a broad action space spanning local execution and web-mediated workflows, which raises heightened safety and security concerns under ambiguity and adversarial steering. We present a trajectory-centric evaluation of Clawdbot across six risk dimensions. Our test suite samples and lightly adapts scenarios from prior agent-safety benchmarks (including ATBench and LPS-Bench) and supplements them with hand-designed cases tailored to Clawdbot's tool surface. We log complete interaction trajectories (messages, actions, tool-call arguments/outputs) and assess safety using both an automated trajectory judge (AgentDoG-Qwen3-4B) and human review. Across 34 canonical cases, we find a non-uniform safety profile: performance is generally consistent on reliability-focused tasks, while most failures arise under underspecified intent, open-ended goals, or benign-seeming jailbreak prompts, where minor misinterpretations can escalate into higher-impact tool actions. We supplemented the overall results with representative case studies and summarized the commonalities of these case...

Read Original Article

Llms

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Last week, a team from Stanford and UCSF (Asadi, O'Sullivan, Fei-Fei Li, Euan Ashley et al.) dropped two companion papers. The first, MAR...

Reddit - Artificial Intelligence · 1 min · 8 minutes ago

Llms

What I learned about multi-agent coordination running 9 specialized Claude agents

I've been experimenting with multi-agent AI systems and ended up building something more ambitious than I originally planned: a fully ope...

Reddit - Artificial Intelligence · 1 min · about 9 hours ago

Robotics

What happens when AI agents can earn and spend real money? I built a small test to find out

I've been sitting with a question for a while: what happens when AI agents aren't just tools to be used, but participants in an economy? ...

Reddit - Artificial Intelligence · 1 min · about 11 hours ago

Llms

[2601.00809] A Modular Reference Architecture for MCP-Servers Enabling Agentic BIM Interaction

Abstract page for arXiv paper 2601.00809: A Modular Reference Architecture for MCP-Servers Enabling Agentic BIM Interaction

arXiv - AI · 4 min · about 17 hours ago

[2602.14364] A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

Summary

Why It Matters

Key Takeaways

Related Articles

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

What I learned about multi-agent coordination running 9 specialized Claude agents

What happens when AI agents can earn and spend real money? I built a small test to find out

[2601.00809] A Modular Reference Architecture for MCP-Servers Enabling Agentic BIM Interaction

No comments

Stay updated with AI News