[2604.02353] Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning
About this article
Abstract page for arXiv paper 2604.02353: Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning
Computer Science > Machine Learning arXiv:2604.02353 (cs) [Submitted on 4 Mar 2026] Title:Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning Authors:Thomas Pravetz View a PDF of the paper titled Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning, by Thomas Pravetz View PDF HTML (experimental) Abstract:We present PRISM (Policy Reuse via Interpretable Strategy Mapping), a framework that grounds reinforcement learning agents' decisions in discrete, causally validated concepts and uses those concepts as a zero-shot transfer interface between agents trained with different algorithms. PRISM clusters each agent's encoder features into $K$ concepts via K-means. Causal intervention establishes that these concepts directly drive - not merely correlate with - agent behavior: overriding concept assignments changes the selected action in 69.4% of interventions ($p = 8.6 \times 10^{-86}$, 2500 interventions). Concept importance and usage frequency are dissociated: the most-used concept (C47, 33.0% frequency) causes only a 9.4% win-rate drop when ablated, while ablating C16 (15.4% frequency) collapses win rate from 100% to 51.8%. Because concepts causally encode strategy, aligning them via optimal bipartite matching transfers strategic knowledge zero-shot. On Go~7$\times$7 with three independently trained agents, concept transfer achieves 69.5%$\pm$3.2% and 76.4%$\pm$3.4% win rate against a standard engine across the two success...