[2505.18883] Partition Generative Modeling: Masked Modeling Without Masks
Summary
The paper introduces Partition Generative Models (PGMs), a novel approach to generative modeling that eliminates mask tokens, improving throughput and performance compared to existing masked generative models.
Why It Matters
This research is significant as it addresses limitations in current generative modeling techniques, particularly in efficiency and performance. By proposing a method that enhances throughput while maintaining the advantages of parallel generation, it could influence future developments in machine learning and AI applications.
Key Takeaways
- PGMs replace masking with partitioning, allowing for efficient token generation.
- They achieve significantly higher throughput compared to traditional masked generative models.
- PGMs maintain compatibility with existing MGM samplers and distillation methods.
Computer Science > Machine Learning arXiv:2505.18883 (cs) [Submitted on 24 May 2025 (v1), last revised 17 Feb 2026 (this version, v3)] Title:Partition Generative Modeling: Masked Modeling Without Masks Authors:Justin Deschenaux, Lan Tran, Caglar Gulcehre View a PDF of the paper titled Partition Generative Modeling: Masked Modeling Without Masks, by Justin Deschenaux and 2 other authors View PDF HTML (experimental) Abstract:Masked generative models (MGMs) can generate tokens in parallel and in any order, unlike autoregressive models (ARMs), which decode one token at a time, left-to-right. However, MGMs process the full-length sequence at every sampling step, including mask tokens that carry no information. In contrast, ARMs process only the previously generated tokens. We introduce ``Partition Generative Models'' (PGMs), which replace masking with partitioning. Tokens are split into two groups that cannot attend to each other, and the model learns to predict each group conditioned on the other, eliminating mask tokens entirely. Because the groups do not interact, PGMs can process only the clean tokens during sampling, like ARMs, while retaining parallel, any-order generation, like MGMs. On OpenWebText, PGMs achieve $5-5.5\times$ higher throughput than MDLM while producing samples with lower Generative Perplexity. On ImageNet, PGMs reach comparable FID to MaskGIT with a $7.5\times$ throughput improvement. With twice as many steps, the FID improves to 4.56 while remaining $3....