[2602.17133] VP-VAE: Rethinking Vector Quantization via Adaptive Vector Perturbation
Summary
The paper introduces VP-VAE, a novel approach to Vector Quantized Variational Autoencoders that decouples representation learning from discretization, enhancing training stability and model robustness.
Why It Matters
This research addresses critical issues in generative modeling, specifically the instability and inefficiencies associated with traditional VQ-VAEs. By proposing a new method that eliminates the need for a codebook, it opens avenues for more reliable and effective generative models, which are essential in various AI applications.
Key Takeaways
- VP-VAE decouples representation learning from discretization, improving training stability.
- The method uses structured perturbations instead of a codebook, enhancing robustness.
- FSP, a variant of VP-VAE, provides practical improvements for fixed quantizers.
- Experimental results show improved reconstruction fidelity and balanced token usage.
- The approach mitigates issues related to codebook collapse in VQ-VAEs.
Computer Science > Machine Learning arXiv:2602.17133 (cs) [Submitted on 19 Feb 2026] Title:VP-VAE: Rethinking Vector Quantization via Adaptive Vector Perturbation Authors:Linwei Zhai, Han Ding, Mingzhi Lin, Cui Zhao, Fei Wang, Ge Wang, Wang Zhi, Wei Xi View a PDF of the paper titled VP-VAE: Rethinking Vector Quantization via Adaptive Vector Perturbation, by Linwei Zhai and 7 other authors View PDF HTML (experimental) Abstract:Vector Quantized Variational Autoencoders (VQ-VAEs) are fundamental to modern generative modeling, yet they often suffer from training instability and "codebook collapse" due to the inherent coupling of representation learning and discrete codebook optimization. In this paper, we propose VP-VAE (Vector Perturbation VAE), a novel paradigm that decouples representation learning from discretization by eliminating the need for an explicit codebook during training. Our key insight is that, from the neural network's viewpoint, performing quantization primarily manifests as injecting a structured perturbation in latent space. Accordingly, VP-VAE replaces the non-differentiable quantizer with distribution-consistent and scale-adaptive latent perturbations generated via Metropolis--Hastings sampling. This design enables stable training without a codebook while making the model robust to inference-time quantization error. Moreover, under the assumption of approximately uniform latent variables, we derive FSP (Finite Scalar Perturbation), a lightweight variant o...