[2512.09882] Comparing AI Agents to Cybersecurity Professionals in

[2512.09882] Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

arXiv - AI March 04, 2026 4 min read

About this article

Abstract page for arXiv paper 2512.09882: Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

Computer Science > Artificial Intelligence arXiv:2512.09882 (cs) [Submitted on 10 Dec 2025 (v1), last revised 3 Mar 2026 (this version, v2)] Title:Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing Authors:Justin W. Lin, Eliot Krzysztof Jones, Donovan Julian Jasper, Ethan Jun-shen Ho, Anna Wu, Arnold Tianyi Yang, Neil Perry, Andy Zou, Matt Fredrikson, J. Zico Kolter, Percy Liang, Dan Boneh, Daniel E. Ho View a PDF of the paper titled Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing, by Justin W. Lin and 12 other authors View PDF HTML (experimental) Abstract:We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold, on a large university network consisting of ~8,000 hosts across 12 subnets. ARTEMIS is a multi-agent framework featuring dynamic prompt generation, arbitrary sub-agents, and automatic vulnerability triaging. In our comparative study, ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate and outperforming 9 of 10 human participants. While existing scaffolds such as Codex and CyAgent underperformed relative to most human participants, ARTEMIS demonstrated technical sophistication and submission quality comparable to the strongest participants. We observe that AI ...

Originally published on March 04, 2026. Curated by AI News.

Llms

[2506.20964] Evidence-based diagnostic reasoning with multi-agent copilot for human pathology

Abstract page for arXiv paper 2506.20964: Evidence-based diagnostic reasoning with multi-agent copilot for human pathology

arXiv - AI · 4 min · 38 minutes ago

Ai Agents

[2601.08323] AtomMem : Learnable Dynamic Agentic Memory with Atomic Memory Operation

Abstract page for arXiv paper 2601.08323: AtomMem : Learnable Dynamic Agentic Memory with Atomic Memory Operation

arXiv - AI · 3 min · 38 minutes ago

Llms

[2603.18349] Large-Scale Analysis of Persuasive Content on Moltbook

Abstract page for arXiv paper 2603.18349: Large-Scale Analysis of Persuasive Content on Moltbook

arXiv - AI · 3 min · 38 minutes ago

Ai Agents

[2511.19669] HeaRT: A Hierarchical Circuit Reasoning Tree-Based Agentic Framework for AMS Design Optimization

Abstract page for arXiv paper 2511.19669: HeaRT: A Hierarchical Circuit Reasoning Tree-Based Agentic Framework for AMS Design Optimization

arXiv - AI · 3 min · 38 minutes ago

[2512.09882] Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

About this article

Related Articles

[2506.20964] Evidence-based diagnostic reasoning with multi-agent copilot for human pathology

[2601.08323] AtomMem : Learnable Dynamic Agentic Memory with Atomic Memory Operation

[2603.18349] Large-Scale Analysis of Persuasive Content on Moltbook

[2511.19669] HeaRT: A Hierarchical Circuit Reasoning Tree-Based Agentic Framework for AMS Design Optimization

No comments

Stay updated with AI News