[2411.00759] Minibatch Optimal Transport and Perplexity Bound Estimation in Discrete Flow Matching
Summary
This paper introduces a novel approach to discrete flow matching using minibatch optimal transport, enhancing generative performance while minimizing state transitions.
Why It Matters
The research addresses limitations in discrete flow matching, a growing area in machine learning. By proposing efficient methods to optimize transitions and perplexity estimation, it contributes to advancements in generative models, which are crucial for applications in natural language processing and beyond.
Key Takeaways
- Introduces a dynamic-optimal-transport-like minimization objective for discrete flows.
- Demonstrates up to a 32-fold reduction in transitions to achieve similar generative perplexity.
- Proposes two upper bounds on perplexity for improved training and evaluation.
- Introduces Multimask Flows that outperform existing models in generative perplexity.
- Highlights the importance of minibatch strategies in optimizing transport costs.
Computer Science > Machine Learning arXiv:2411.00759 (cs) [Submitted on 1 Nov 2024 (v1), last revised 23 Feb 2026 (this version, v4)] Title:Minibatch Optimal Transport and Perplexity Bound Estimation in Discrete Flow Matching Authors:Etrit Haxholli, Yeti Z. Gurbuz, Ogul Can, Eli Waxman View a PDF of the paper titled Minibatch Optimal Transport and Perplexity Bound Estimation in Discrete Flow Matching, by Etrit Haxholli and 3 other authors View PDF HTML (experimental) Abstract:Discrete flow matching, a recent framework for modeling categorical data, has shown competitive performance with autoregressive models. However, unlike continuous flow matching, the rectification strategy cannot be applied due to the stochasticity of discrete paths, necessitating alternative methods to minimize state transitions. We propose a dynamic-optimal-transport-like minimization objective and derive its Kantorovich formulation for discrete flows with convex interpolants, where transport cost depends solely on inter-state similarity and can be optimized via minibatch strategies. We show that such methods can reduce the number of transitions up to 32 times (1024 to 32) to reach the same generative perplexity without compromising diversity. Additionally, path nondeterminism in discrete flows precludes an instantaneous change-of-variables analogue, preventing precise probability estimation available to continuous flows. We therefore propose two upper bounds on perplexity, enabling principled traini...