[2603.22499] OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection
About this article
Abstract page for arXiv paper 2603.22499: OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection
Computer Science > Cryptography and Security arXiv:2603.22499 (cs) [Submitted on 23 Mar 2026] Title:OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection Authors:Jeffrey Flynt View a PDF of the paper titled OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection, by Jeffrey Flynt View PDF HTML (experimental) Abstract:Synthetic insider threat benchmarks face a consistency problem: corpora generated without an external factual constraint cannot rule out cross-artifact contradictions. The CERT dataset -- the field's canonical benchmark -- is also static, lacks cross-surface correlation scenarios, and predates the LLM era. We present OrgForge-IT, a verifiable synthetic benchmark in which a deterministic simulation engine maintains ground truth and language models generate only surface prose, making cross-artifact consistency an architectural guarantee. The corpus spans 51 simulated days, 2,904 telemetry records at a 96.4% noise rate, and four detection scenarios designed to defeat single-surface and single-day triage strategies across three threat classes and eight injectable behaviors. A ten-model leaderboard reveals several findings: (1) triage and verdict accuracy dissociate - eight models achieve identical triage F1=0.80 yet split between verdict F1=1.0 and 0.80; (2) baseline false-positive rate is a necessary companion to verdict F1, with models at identical verdict accuracy differing by two orders of magnitude o...