[2602.02929] RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly Detection
Summary
This paper presents RPG-AE, a neuro-symbolic framework combining Graph Autoencoders and rare pattern mining for detecting Advanced Persistent Threats in system-level provenance data.
Why It Matters
As cyber threats become increasingly sophisticated, effective detection methods are crucial for cybersecurity. This research introduces a novel approach that enhances anomaly detection by integrating advanced machine learning techniques, potentially improving security measures across various systems.
Key Takeaways
- RPG-AE combines Graph Autoencoders with rare pattern mining for anomaly detection.
- The method improves detection of Advanced Persistent Threats (APTs) in system behavior.
- Evaluation on DARPA datasets shows substantial gains in anomaly ranking quality.
- The approach outperforms existing unsupervised methods and is competitive with ensemble techniques.
- Coupling graph-based learning with classical pattern mining enhances interpretability and effectiveness.
Computer Science > Machine Learning arXiv:2602.02929 (cs) [Submitted on 3 Feb 2026 (v1), last revised 15 Feb 2026 (this version, v2)] Title:RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly Detection Authors:Asif Tauhid, Sidahmed Benabderrahmane, Mohamad Altrabulsi, Ahamed Foisal, Talal Rahwan View a PDF of the paper titled RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly Detection, by Asif Tauhid and 4 other authors View PDF HTML (experimental) Abstract:Advanced Persistent Threats (APTs) are sophisticated, long-term cyberattacks that are difficult to detect because they operate stealthily and often blend into normal system behavior. This paper presents a neuro-symbolic anomaly detection framework that combines a Graph Autoencoder (GAE) with rare pattern mining to identify APT-like activities in system-level provenance data. Our approach first constructs a process behavioral graph using k-Nearest Neighbors based on feature similarity, then learns normal relational structure using a Graph Autoencoder. Anomaly candidates are identified through deviations between observed and reconstructed graph structure. To further improve detection, we integrate an rare pattern mining module that discovers infrequent behavioral co-occurrences and uses them to boost anomaly scores for processes exhibiting rare signatures. We evaluate the proposed method on the DARPA Transparent Computing datasets an...