[2602.19025] Routing-Aware Explanations for Mixture of Experts Graph Models in Malware Detection
Summary
This article presents a novel approach to malware detection using Mixture-of-Experts (MoE) graph models, emphasizing routing-aware explanations for improved transparency and accuracy.
Why It Matters
As malware threats evolve, effective detection methods are crucial for cybersecurity. This research enhances the interpretability of machine learning models in malware detection, addressing the need for transparency in AI decision-making processes.
Key Takeaways
- Introduces a Mixture-of-Experts (MoE) model for malware detection using control flow graphs (CFGs).
- Demonstrates improved detection accuracy and transparency through routing-aware explanations.
- Combines multiple neighborhood statistics to enhance node representation in graphs.
- Evaluates against established GNN baselines, showing superior performance.
- Highlights the importance of expert-level diversity in AI model decision-making.
Computer Science > Cryptography and Security arXiv:2602.19025 (cs) [Submitted on 22 Feb 2026] Title:Routing-Aware Explanations for Mixture of Experts Graph Models in Malware Detection Authors:Hossein Shokouhinejad, Roozbeh Razavi-Far, Griffin Higgins, Ali.A Ghorbani View a PDF of the paper titled Routing-Aware Explanations for Mixture of Experts Graph Models in Malware Detection, by Hossein Shokouhinejad and 3 other authors View PDF HTML (experimental) Abstract:Mixture-of-Experts (MoE) offers flexible graph reasoning by combining multiple views of a graph through a learned router. We investigate routing-aware explanations for MoE graph models in malware detection using control flow graphs (CFGs). Our architecture builds diversity at two levels. At the node level, each layer computes multiple neighborhood statistics and fuses them with an MLP, guided by a degree reweighting factor rho and a pooling choice lambda in {mean, std, max}, producing distinct node representations that capture complementary structural cues in CFGs. At the readout level, six experts, each tied to a specific (rho, lambda) view, output graph-level logits that the router weights into a final prediction. Post-hoc explanations are generated with edge-level attributions per expert and aggregated using the router gates so the rationale reflects both what each expert highlights and how strongly it is selected. Evaluated against single-expert GNN baselines such as GCN, GIN, and GAT on the same CFG dataset, th...