[2509.22566] From Parameters to Behaviors: Unsupervised Compression of the Policy Space
Summary
This paper presents an unsupervised method for compressing the policy parameter space in Deep Reinforcement Learning, enhancing sample efficiency and adaptability across tasks.
Why It Matters
The research addresses a critical challenge in Deep Reinforcement Learning (DRL) related to sample inefficiency, particularly in multi-task environments. By proposing a novel approach to compress the policy space, it opens avenues for more efficient learning and adaptation, which is vital for advancing AI applications.
Key Takeaways
- Introduces a method to compress the policy parameter space, improving efficiency.
- Demonstrates that compression can retain expressivity while reducing dimensionality.
- Validates the approach in continuous control domains, showing significant improvements.
Computer Science > Machine Learning arXiv:2509.22566 (cs) [Submitted on 26 Sep 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:From Parameters to Behaviors: Unsupervised Compression of the Policy Space Authors:Davide Tenedini, Riccardo Zamboni, Mirco Mutti, Marcello Restelli View a PDF of the paper titled From Parameters to Behaviors: Unsupervised Compression of the Policy Space, by Davide Tenedini and 3 other authors View PDF HTML (experimental) Abstract:Despite its recent successes, Deep Reinforcement Learning (DRL) is notoriously sample-inefficient. We argue that this inefficiency stems from the standard practice of optimizing policies directly in the high-dimensional and highly redundant parameter space $\Theta$. This challenge is greatly compounded in multi-task settings. In this work, we develop a novel, unsupervised approach that compresses the policy parameter space $\Theta$ into a low-dimensional latent space $\mathcal{Z}$. We train a generative model $g:\mathcal{Z}\to\Theta$ by optimizing a behavioral reconstruction loss, which ensures that the latent space is organized by functional similarity rather than proximity in parameterization. We conjecture that the inherent dimensionality of this manifold is a function of the environment's complexity, rather than the size of the policy network. We validate our approach in continuous control domains, showing that the parameterization of standard policy networks can be compressed up to five orders of magnit...