[2604.01843] Investigating Permutation-Invariant Discrete Representation Learning for Spatially Aligned Images
About this article
Abstract page for arXiv paper 2604.01843: Investigating Permutation-Invariant Discrete Representation Learning for Spatially Aligned Images
Computer Science > Computer Vision and Pattern Recognition arXiv:2604.01843 (cs) [Submitted on 2 Apr 2026] Title:Investigating Permutation-Invariant Discrete Representation Learning for Spatially Aligned Images Authors:Jamie S. J. Stirling, Noura Al-Moubayed, Hubert P. H. Shum View a PDF of the paper titled Investigating Permutation-Invariant Discrete Representation Learning for Spatially Aligned Images, by Jamie S. J. Stirling and 2 other authors View PDF HTML (experimental) Abstract:Vector quantization approaches (VQ-VAE, VQ-GAN) learn discrete neural representations of images, but these representations are inherently position-dependent: codes are spatially arranged and contextually entangled, requiring autoregressive or diffusion-based priors to model their dependencies at sample time. In this work, we ask whether positional information is necessary for discrete representations of spatially aligned data. We propose the permutation-invariant vector-quantized autoencoder (PI-VQ), in which latent codes are constrained to carry no positional information. We find that this constraint encourages codes to capture global, semantic features, and enables direct interpolation between images without a learned prior. To address the reduced information capacity of permutation-invariant representations, we introduce matching quantization, a vector quantization algorithm based on optimal bipartite matching that increases effective bottleneck capacity by $3.5\times$ relative to naive ne...