[2602.16564] A Scalable Approach to Solving Simulation-Based Network Security Games
Summary
The paper presents MetaDOAR, a scalable meta-controller for solving simulation-based network security games, enhancing multi-agent reinforcement learning efficiency in large cyber environments.
Why It Matters
As cyber threats become more complex, efficient strategies for network security are critical. MetaDOAR's approach offers a practical solution for improving decision-making in large-scale environments, potentially leading to better security outcomes and resource management.
Key Takeaways
- MetaDOAR enhances the Double Oracle / PSRO paradigm with a learned filtering layer.
- It enables scalable multi-agent reinforcement learning in large network environments.
- The method reduces redundant computations while maintaining decision quality.
- Empirical results show higher player payoffs compared to state-of-the-art baselines.
- The approach provides a theoretically motivated path for hierarchical policy learning.
Computer Science > Machine Learning arXiv:2602.16564 (cs) [Submitted on 18 Feb 2026] Title:A Scalable Approach to Solving Simulation-Based Network Security Games Authors:Michael Lanier, Yevgeniy Vorobeychik View a PDF of the paper titled A Scalable Approach to Solving Simulation-Based Network Security Games, by Michael Lanier and Yevgeniy Vorobeychik View PDF HTML (experimental) Abstract:We introduce MetaDOAR, a lightweight meta-controller that augments the Double Oracle / PSRO paradigm with a learned, partition-aware filtering layer and Q-value caching to enable scalable multi-agent reinforcement learning on very large cyber-network environments. MetaDOAR learns a compact state projection from per node structural embeddings to rapidly score and select a small subset of devices (a top-k partition) on which a conventional low-level actor performs focused beam search utilizing a critic agent. Selected candidate actions are evaluated with batched critic forwards and stored in an LRU cache keyed by a quantized state projection and local action identifiers, dramatically reducing redundant critic computation while preserving decision quality via conservative k-hop cache invalidation. Empirically, MetaDOAR attains higher player payoffs than SOTA baselines on large network topologies, without significant scaling issues in terms of memory usage or training time. This contribution provide a practical, theoretically motivated path to efficient hierarchical policy learning for large-s...