[2509.16117] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Summary
The paper presents DiffusionNFT, a novel online reinforcement learning paradigm that optimizes diffusion models directly on the forward process, enhancing efficiency and performance in generative tasks.
Why It Matters
DiffusionNFT addresses significant challenges in applying reinforcement learning to diffusion models, particularly the inefficiencies and complexities of existing methods. By improving training efficiency and integrating reinforcement signals, this research has implications for advancements in generative AI and machine learning applications.
Key Takeaways
- DiffusionNFT optimizes diffusion models directly on the forward process, enhancing efficiency.
- The method eliminates the need for likelihood estimation and complicated integration with classifier-free guidance.
- DiffusionNFT is significantly more efficient than existing methods, achieving better performance in fewer steps.
- The approach allows for training with arbitrary black-box solvers, broadening its applicability.
- Utilizing multiple reward models, DiffusionNFT improves performance across various benchmarks.
Computer Science > Machine Learning arXiv:2509.16117 (cs) [Submitted on 19 Sep 2025 (v1), last revised 16 Feb 2026 (this version, v2)] Title:DiffusionNFT: Online Diffusion Reinforcement with Forward Process Authors:Kaiwen Zheng, Huayu Chen, Haotian Ye, Haoxiang Wang, Qinsheng Zhang, Kai Jiang, Hang Su, Stefano Ermon, Jun Zhu, Ming-Yu Liu View a PDF of the paper titled DiffusionNFT: Online Diffusion Reinforcement with Forward Process, by Kaiwen Zheng and 9 other authors View PDF HTML (experimental) Abstract:Online reinforcement learning (RL) has been central to post-training language models, but its extension to diffusion models remains challenging due to intractable likelihoods. Recent works discretize the reverse sampling process to enable GRPO-style training, yet they inherit fundamental drawbacks, including solver restrictions, forward-reverse inconsistency, and complicated integration with classifier-free guidance (CFG). We introduce Diffusion Negative-aware FineTuning (DiffusionNFT), a new online RL paradigm that optimizes diffusion models directly on the forward process via flow matching. DiffusionNFT contrasts positive and negative generations to define an implicit policy improvement direction, naturally incorporating reinforcement signals into the supervised learning objective. This formulation enables training with arbitrary black-box solvers, eliminates the need for likelihood estimation, and requires only clean images rather than sampling trajectories for policy...