[2510.02826] Multi-scale Autoregressive Models are Laplacian, Discrete, and Latent Diffusion Models in Disguise
Summary
This paper explores the reinterpretation of Visual Autoregressive Models (VAR) as iterative refinement models, linking them to denoising diffusion processes and identifying key design choices that influence their efficiency and sample quality.
Why It Matters
Understanding the relationship between autoregressive models and diffusion processes is crucial for advancing machine learning techniques. This research provides insights into model efficiency and quality, which can impact various applications, including image generation and weather forecasting.
Key Takeaways
- VAR models can be viewed as iterative refinement models that enhance quality and efficiency.
- The paper identifies three critical design choices affecting VAR performance: latent space refinement, discrete prediction, and spatial frequency decomposition.
- The findings support the adaptation of this framework for diverse applications, including graph generation and weather forecasting.
Computer Science > Machine Learning arXiv:2510.02826 (cs) [Submitted on 3 Oct 2025 (v1), last revised 16 Feb 2026 (this version, v2)] Title:Multi-scale Autoregressive Models are Laplacian, Discrete, and Latent Diffusion Models in Disguise Authors:Steve Hong, Samuel Belkadi View a PDF of the paper titled Multi-scale Autoregressive Models are Laplacian, Discrete, and Latent Diffusion Models in Disguise, by Steve Hong and 1 other authors View PDF HTML (experimental) Abstract:We reinterpret Visual Autoregressive (VAR) models as iterative refinement models to identify which design choices drive their quality-efficiency trade-off. Instead of treating VAR only as next-scale autoregression, we formalise it as a deterministic forward process that builds a Laplacian-style latent pyramid, together with a learned backward process that reconstructs samples in a small number of coarse-to-fine steps. This formulation makes the link to denoising diffusion explicit and highlights three modelling choices that may underlie VAR's efficiency and sample quality: refinement in a learned latent space, discrete prediction over code indices, and decomposition by spatial frequency. We support this view with controlled experiments that isolate the contribution of each factor to quality and speed. We also discuss how the same framework can be adapted to permutation-invariant graph generation and probabilistic medium-range weather forecasting, and how it provides practical points of contact with diffus...