[2602.19261] DGPO: RL-Steered Graph Diffusion for Neural Architecture Generation
Summary
The paper presents DGPO, a method for neural architecture generation using reinforcement learning to optimize directed graph diffusion models, achieving near-optimal results in benchmark tasks.
Why It Matters
This research is significant as it addresses the limitations of existing graph diffusion methods that do not account for the directed nature of neural architectures. By introducing DGPO, the authors provide a new framework that enhances the efficiency and effectiveness of neural architecture search, which is crucial for advancing machine learning applications.
Key Takeaways
- DGPO extends reinforcement learning to directed acyclic graphs for neural architecture generation.
- The method demonstrates high performance on NAS-Bench-101 and NAS-Bench-201 benchmarks.
- Transferable structural priors allow DGPO to generate architectures close to optimal with minimal training data.
- Bidirectional control experiments validate the effectiveness of reward-driven steering in architecture generation.
- This approach provides a controllable framework for generating directed combinatorial structures.
Computer Science > Machine Learning arXiv:2602.19261 (cs) [Submitted on 22 Feb 2026] Title:DGPO: RL-Steered Graph Diffusion for Neural Architecture Generation Authors:Aleksei Liuliakov, Luca Hermes, Barbara Hammer View a PDF of the paper titled DGPO: RL-Steered Graph Diffusion for Neural Architecture Generation, by Aleksei Liuliakov and 2 other authors View PDF HTML (experimental) Abstract:Reinforcement learning fine-tuning has proven effective for steering generative diffusion models toward desired properties in image and molecular domains. Graph diffusion models have similarly been applied to combinatorial structure generation, including neural architecture search (NAS). However, neural architectures are directed acyclic graphs (DAGs) where edge direction encodes functional semantics such as data flow-information that existing graph diffusion methods, designed for undirected structures, discard. We propose Directed Graph Policy Optimization (DGPO), which extends reinforcement learning fine-tuning of discrete graph diffusion models to DAGs via topological node ordering and positional encoding. Validated on NAS-Bench-101 and NAS-Bench-201, DGPO matches the benchmark optimum on all three NAS-Bench-201 tasks (91.61%, 73.49%, 46.77%). The central finding is that the model learns transferable structural priors: pretrained on only 7% of the search space, it generates near-oracle architectures after fine-tuning, within 0.32 percentage points of the full-data model and extrapolat...