ClawBench: Can AI Agents Complete Everyday Online Tasks? 153 tasks, 144 live websites, best model at 33.3% [R]

Reddit - Machine Learning 1 min read

About this article

We introduce ClawBench, a benchmark that evaluates AI browser agents on 153 real-world everyday tasks across 144 live websites. Unlike synthetic benchmarks, ClawBench tests agents on actual production platforms. Key findings: The best model (Claude Sonnet 4.6) achieves only 33.3% success rate GLM-5 (Zhipu AI) comes second at 24.2% — surprisingly strong for a text-only model Finance and Academic tasks are easier (50% for the best model); Travel and Dev tasks are much harder No model exceeds 50...

You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket

Originally published on April 14, 2026. Curated by AI News.

Related Articles

Llms

How much are you actually spending on AI APIs? I built an OpenSource router to cut that.

I've been working on Manifest, an open-source AI cost optimization tool. The idea is simple: instead of sending every request to the same...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Code Degradation: An interesting and novel find

As many of you have likely seen, the Claude Code community newswire has been ablaze with Claude Code being quite degraded lately, startin...

Reddit - Artificial Intelligence · 1 min ·
Llms

I built a tool that turns repeated file reads into 13-token references. My AI Coding sessions now use 86% fewer tokens on file-heavy tasks based on mathematics and research. [P]

I got tired of watching Claude Code re-read the same files over and over. A 2,000-token file read 5 times = 10,000 tokens gone. So I buil...

Reddit - Machine Learning · 1 min ·
Llms

Claude Launched routines in Claude Code.

https://preview.redd.it/v47kba3gu6vg1.png?width=1209&format=png&auto=webp&s=8643a24ef8d3ec5de52dcf214a65fa4c00e4b667 submitte...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime