[2512.10877] Guided Transfer Learning for Discrete Diffusion Models
Summary
This paper introduces Guided Transfer Learning (GTL) for discrete diffusion models, addressing challenges in small-data scenarios and offering a practical algorithm for efficient sampling from target distributions.
Why It Matters
The research tackles the limitations of discrete diffusion models in small-data environments, which are common in real-world applications. By proposing GTL, the study enhances the adaptability of these models, potentially improving performance in various language modeling tasks and other applications where data scarcity is an issue.
Key Takeaways
- GTL enables efficient sampling from target distributions without modifying pretrained denoisers.
- The algorithm reduces computational costs to linear scaling in vocabulary size, facilitating longer sequence generation.
- GTL is particularly effective in small-data scenarios, outperforming traditional weight fine-tuning methods.
- A key limitation of GTL arises when source and target distributions overlap poorly, affecting transfer performance.
- The study provides empirical evaluations on synthetic Markov chains and language modeling tasks.
Computer Science > Machine Learning arXiv:2512.10877 (cs) [Submitted on 11 Dec 2025 (v1), last revised 20 Feb 2026 (this version, v2)] Title:Guided Transfer Learning for Discrete Diffusion Models Authors:Julian Kleutgens, Claudio Battiloro, Lingkai Kong, Benjamin Grewe, Francesca Dominici, Mauricio Tec View a PDF of the paper titled Guided Transfer Learning for Discrete Diffusion Models, by Julian Kleutgens and 5 other authors View PDF HTML (experimental) Abstract:Discrete diffusion models (DMs) have achieved strong performance in language and other discrete domains, offering a compelling alternative to autoregressive modeling. Yet this performance typically depends on large training datasets, challenging the performance of DMs in small-data regimes -- common under real-world constraints. Aimed at this challenge, recent work in continuous DMs suggests that transfer learning via classifier ratio-based guidance can adapt a pretrained DM to a related target distribution, often outperforming alternatives such as full-weight fine-tuning on the target data. By contrast, transfer learning for discrete DMs remains unexplored. We address this gap by exploring practical analogues of ratio-based transfer learning for discrete DMs. Our theoretical analysis shows that a direct extension of existing ratio-based guidance is computationally prohibitive, scaling with vocabulary size. To overcome this limitation, we introduce a scheduling mechanism that yields a practical algorithm, Guided ...