[2603.19864] NASimJax: GPU-Accelerated Policy Learning Framework for Penetration Testing
About this article
Abstract page for arXiv paper 2603.19864: NASimJax: GPU-Accelerated Policy Learning Framework for Penetration Testing
Computer Science > Machine Learning arXiv:2603.19864 (cs) [Submitted on 20 Mar 2026] Title:NASimJax: GPU-Accelerated Policy Learning Framework for Penetration Testing Authors:Raphael Simon, José Carrasquel, Wim Mees, Pieter Libin View a PDF of the paper titled NASimJax: GPU-Accelerated Policy Learning Framework for Penetration Testing, by Raphael Simon and Jos\'e Carrasquel and Wim Mees and Pieter Libin View PDF HTML (experimental) Abstract:Penetration testing, the practice of simulating cyberattacks to identify vulnerabilities, is a complex sequential decision-making task that is inherently partially observable and features large action spaces. Training reinforcement learning (RL) policies for this domain faces a fundamental bottleneck: existing simulators are too slow to train on realistic network scenarios at scale, resulting in policies that fail to generalize. We present NASimJax, a complete JAX-based reimplementation of the Network Attack Simulator (NASim), achieving up to 100x higher environment throughput than the original simulator. By running the entire training pipeline on hardware accelerators, NASimJax enables experimentation on larger networks under fixed compute budgets that were previously infeasible. We formulate automated penetration testing as a Contextual POMDP and introduce a network generation pipeline that produces structurally diverse and guaranteed-solvable scenarios. Together, these provide a principled basis for studying zero-shot policy generali...