[2602.17875] MultiVer: Zero-Shot Multi-Agent Vulnerability Detection
Summary
The paper presents MultiVer, a zero-shot multi-agent system for vulnerability detection that outperforms fine-tuned models in recall, achieving state-of-the-art results without the need for fine-tuning.
Why It Matters
This research is significant as it demonstrates the potential of zero-shot learning in security applications, where minimizing false negatives is crucial. By surpassing fine-tuned models in recall, MultiVer offers a promising alternative for vulnerability detection, which can enhance security measures in software development.
Key Takeaways
- MultiVer achieves 82.7% recall on PyVul, exceeding fine-tuned models.
- The system utilizes a four-agent ensemble for enhanced performance.
- While recall improves, precision decreases, highlighting trade-offs in detection.
- Zero-shot learning can match specialized systems in security detection.
- The findings suggest a shift in focus towards recall in security-critical applications.
Computer Science > Multiagent Systems arXiv:2602.17875 (cs) [Submitted on 19 Feb 2026] Title:MultiVer: Zero-Shot Multi-Agent Vulnerability Detection Authors:Shreshth Rajan View a PDF of the paper titled MultiVer: Zero-Shot Multi-Agent Vulnerability Detection, by Shreshth Rajan View PDF HTML (experimental) Abstract:We present MultiVer, a zero-shot multi-agent system for vulnerability detection that achieves state-of-the-art recall without fine-tuning. A four-agent ensemble (security, correctness, performance, style) with union voting achieves 82.7% recall on PyVul, exceeding fine-tuned GPT-3.5 (81.3%) by 1.4 percentage points -- the first zeroshot system to surpass fine-tuned performance on this benchmark. On SecurityEval, the same architecture achieves 91.7% detection rate, matching specialized systems. The recall improvement comes at a precision cost: 48.8% precision versus 63.9% for fine-tuned baselines, yielding 61.4% F1. Ablation experiments isolate component contributions: the multi-agent ensemble adds 17 percentage points recall over single-agent security analysis. These results demonstrate that for security applications where false negatives are costlier than false positives, zero-shot multi-agent ensembles can match and exceed fine-tuned models on the metric that matters most. Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.17875 [cs.MA] (or arXiv:2602.17875v1 [cs.MA] for this version) https://doi.org/10.48550/arXiv.260...