[2601.22983] PIDSMaker: Building and Evaluating Provenance-based Intrusion Detection Systems

[2601.22983] PIDSMaker: Building and Evaluating Provenance-based Intrusion Detection Systems

arXiv - Machine Learning 3 min read Article

Summary

PIDSMaker is an open-source framework designed for building and evaluating provenance-based intrusion detection systems (PIDSs), addressing inconsistencies in prior evaluations.

Why It Matters

This framework enhances the reproducibility and comparability of PIDSs, which are crucial for detecting advanced persistent threats. By standardizing evaluation protocols, PIDSMaker supports researchers in developing more effective security systems, ultimately improving cybersecurity measures.

Key Takeaways

  • PIDSMaker consolidates eight PIDSs into a modular architecture for better evaluation.
  • Standardized preprocessing and ground-truth labels enhance reproducibility.
  • The framework supports rapid prototyping with a YAML-based configuration interface.
  • Includes utilities for hyperparameter tuning and visualization to improve research methodologies.
  • Preprocessed datasets and labels are provided to facilitate shared evaluations.

Computer Science > Cryptography and Security arXiv:2601.22983 (cs) [Submitted on 30 Jan 2026 (v1), last revised 13 Feb 2026 (this version, v2)] Title:PIDSMaker: Building and Evaluating Provenance-based Intrusion Detection Systems Authors:Tristan Bilot, Baoxiang Jiang, Thomas Pasquier View a PDF of the paper titled PIDSMaker: Building and Evaluating Provenance-based Intrusion Detection Systems, by Tristan Bilot and 2 other authors View PDF Abstract:Recent provenance-based intrusion detection systems (PIDSs) have demonstrated strong potential for detecting advanced persistent threats (APTs) by applying machine learning to system provenance graphs. However, evaluating and comparing PIDSs remains difficult: prior work uses inconsistent preprocessing pipelines, non-standard dataset splits, and incompatible ground-truth labeling and metrics. These discrepancies undermine reproducibility, impede fair comparison, and impose substantial re-implementation overhead on researchers. We present PIDSMaker, an open-source framework for developing and evaluating PIDSs under consistent protocols. PIDSMaker consolidates eight state-of-the-art systems into a modular, extensible architecture with standardized preprocessing and ground-truth labels, enabling consistent experiments and apples-to-apples comparisons. A YAML-based configuration interface supports rapid prototyping by composing components across systems without code changes. PIDSMaker also includes utilities for ablation studies, hyp...

Related Articles

Machine Learning

Finally Abliterated Sarvam 30B and 105B!

I abliterated Sarvam-30B and 105B - India's first multilingual MoE reasoning models - and found something interesting along the way! Reas...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

Hi everyone, Just wanted to share a small but hard-won milestone. After a long plateau at 94.48%, we’ve pushed the official BANKING77-77 ...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

Free tool I built to score dataset quality (LQS) — feedback welcome [D]

We built a Label Quality Score (LQS) system for our dataset marketplace and opened it up as a free standalone tool. Upload a dataset → ge...

Reddit - Machine Learning · 1 min ·
Meta’s New AI Model Gives Mark Zuckerberg a Seat at the Big Kid’s Table | WIRED
Machine Learning

Meta’s New AI Model Gives Mark Zuckerberg a Seat at the Big Kid’s Table | WIRED

Muse Spark is Meta’s first model since its AI reboot, and the benchmarks suggest formidable performance.

Wired - AI · 6 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime