[2511.21448] The Phish, The Spam, and The Valid: Generating

[2511.21448] The Phish, The Spam, and The Valid: Generating Feature-Rich Emails for Benchmarking LLMs

arXiv - AI March 23, 2026 4 min read

About this article

Abstract page for arXiv paper 2511.21448: The Phish, The Spam, and The Valid: Generating Feature-Rich Emails for Benchmarking LLMs

Computer Science > Cryptography and Security arXiv:2511.21448 (cs) [Submitted on 26 Nov 2025 (v1), last revised 20 Mar 2026 (this version, v5)] Title:The Phish, The Spam, and The Valid: Generating Feature-Rich Emails for Benchmarking LLMs Authors:Rebeka Toth, Tamas Bisztray, Nils Gruschka View a PDF of the paper titled The Phish, The Spam, and The Valid: Generating Feature-Rich Emails for Benchmarking LLMs, by Rebeka Toth and 2 other authors View PDF HTML (experimental) Abstract:In this paper, we introduce a metadata-enriched generation framework (PhishFuzzer) that seeds real emails into Large Language Models (LLMs) to produce 23,100 diverse, structurally consistent email variants across controlled entity and length dimensions. Unlike prior corpora, our dataset features strict three-class labels (Phishing, Spam, Valid), provides full URL and attachment metadata, and annotates each email with attacker intent. Using this dataset, we benchmark two state-of-the-art LLMs (Qwen-2.5-72B and Gemini-3.1-Pro) under both Basic (body, subject) and Full (+URL, sender, attachment) settings. By applying formal confidence metrics (Task Success Rate and Confidence Index), we analyze model reliability, robustness against linguistic fuzzing, and the impact of structural metadata on detection accuracy. Our fully open-source framework and dataset provide a rigorous foundation for evaluating next-generation email security systems. To support open science, we make the PhishFuzzer Dataset, the ge...

Originally published on March 23, 2026. Curated by AI News.

Llms

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

BraiNN An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning BraiNN is a compact research‑...

Reddit - Machine Learning · 1 min · 29 minutes ago

Llms

We hit 150 stars on our AI setup tool!

yo folks, we just hit 150 stars on our open source tool that auto makes AI context files. got 90 PRs merged and 20 issues that ppl are pi...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Is ai getting dummer?

Over the past month, it feels like GPT and Gemini have been giving wrong answers a lot. Do you feel the same, or am I exaggerating? submi...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

If AI is really making us more productive... why does it feel like we are working more, not less...?

The promise of AI was the ultimate system optimisation: Efficiency. On paper, the tools are delivering something similar to what they pro...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

[2511.21448] The Phish, The Spam, and The Valid: Generating Feature-Rich Emails for Benchmarking LLMs

About this article

Related Articles

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

We hit 150 stars on our AI setup tool!

Is ai getting dummer?

If AI is really making us more productive... why does it feel like we are working more, not less...?

No comments

Stay updated with AI News