[2603.27044] Unsupervised Behavioral Compression: Learning Low-Dimensional Policy Manifolds through State-Occupancy Matching
About this article
Abstract page for arXiv paper 2603.27044: Unsupervised Behavioral Compression: Learning Low-Dimensional Policy Manifolds through State-Occupancy Matching
Computer Science > Machine Learning arXiv:2603.27044 (cs) [Submitted on 27 Mar 2026] Title:Unsupervised Behavioral Compression: Learning Low-Dimensional Policy Manifolds through State-Occupancy Matching Authors:Andrea Fraschini, Davide Tenedini, Riccardo Zamboni, Mirco Mutti, Marcello Restelli View a PDF of the paper titled Unsupervised Behavioral Compression: Learning Low-Dimensional Policy Manifolds through State-Occupancy Matching, by Andrea Fraschini and 4 other authors View PDF HTML (experimental) Abstract:Deep Reinforcement Learning (DRL) is widely recognized as sample-inefficient, a limitation attributable in part to the high dimensionality and substantial functional redundancy inherent to the policy parameter space. A recent framework, which we refer to as Action-based Policy Compression (APC), mitigates this issue by compressing the parameter space $\Theta$ into a low-dimensional latent manifold $\mathcal Z$ using a learned generative mapping $g:\mathcal Z \to \Theta$. However, its performance is severely constrained by relying on immediate action-matching as a reconstruction loss, a myopic proxy for behavioral similarity that suffers from compounding errors across sequential decisions. To overcome this bottleneck, we introduce Occupancy-based Policy Compression (OPC), which enhances APC by shifting behavior representation from immediate action-matching to long-horizon state-space coverage. Specifically, we propose two principal improvements: (1) we curate the dat...