[2602.14364] A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)
Summary
This article presents a trajectory-based safety audit of Clawdbot, an AI agent, evaluating its performance across various risk dimensions and identifying vulnerabilities in its safety protocols.
Why It Matters
As AI agents like Clawdbot become more integrated into workflows, understanding their safety profiles is crucial. This audit highlights potential risks, informing developers and users about the importance of robust safety measures in AI systems, especially in ambiguous scenarios.
Key Takeaways
- Clawdbot's safety audit reveals a non-uniform safety profile across different tasks.
- Most failures occur under underspecified intent and open-ended goals, emphasizing the need for clear directives.
- The study utilizes both automated and human reviews to assess safety, providing a comprehensive evaluation.
- Common vulnerabilities include misinterpretations that can lead to significant tool actions.
- Case studies illustrate typical failure modes, offering insights for future AI safety improvements.
Computer Science > Cryptography and Security arXiv:2602.14364 (cs) [Submitted on 16 Feb 2026] Title:A Trajectory-Based Safety Audit of Clawdbot (OpenClaw) Authors:Tianyu Chen, Dongrui Liu, Xia Hu, Jingyi Yu, Wenjie Wang View a PDF of the paper titled A Trajectory-Based Safety Audit of Clawdbot (OpenClaw), by Tianyu Chen and 4 other authors View PDF Abstract:Clawdbot is a self-hosted, tool-using personal AI agent with a broad action space spanning local execution and web-mediated workflows, which raises heightened safety and security concerns under ambiguity and adversarial steering. We present a trajectory-centric evaluation of Clawdbot across six risk dimensions. Our test suite samples and lightly adapts scenarios from prior agent-safety benchmarks (including ATBench and LPS-Bench) and supplements them with hand-designed cases tailored to Clawdbot's tool surface. We log complete interaction trajectories (messages, actions, tool-call arguments/outputs) and assess safety using both an automated trajectory judge (AgentDoG-Qwen3-4B) and human review. Across 34 canonical cases, we find a non-uniform safety profile: performance is generally consistent on reliability-focused tasks, while most failures arise under underspecified intent, open-ended goals, or benign-seeming jailbreak prompts, where minor misinterpretations can escalate into higher-impact tool actions. We supplemented the overall results with representative case studies and summarized the commonalities of these case...