[2602.13136] Order Matters in Retrosynthesis: Structure-aware Generation via Reaction-Center-Guided Discrete Flow Matching
Summary
The paper presents a novel framework for retrosynthesis, emphasizing the importance of atom ordering in neural representations to enhance chemical reaction predictions.
Why It Matters
This research addresses limitations in existing retrosynthesis methods by introducing a structure-aware framework that improves learning efficiency and performance. By prioritizing reaction center atoms, it offers a more effective approach to chemical synthesis, potentially impacting drug discovery and materials science.
Key Takeaways
- Introduces a structure-aware template-free framework for retrosynthesis.
- Highlights the significance of atom ordering in improving model performance.
- Achieves state-of-the-art results on USPTO datasets with significantly less data.
- Demonstrates that structural priors can outperform larger models without proper ordering.
- Proposes a novel graph transformer architecture with rotary position embeddings.
Computer Science > Machine Learning arXiv:2602.13136 (cs) [Submitted on 13 Feb 2026] Title:Order Matters in Retrosynthesis: Structure-aware Generation via Reaction-Center-Guided Discrete Flow Matching Authors:Chenguang Wang, Zihan Zhou, Lei Bai, Tianshu Yu View a PDF of the paper titled Order Matters in Retrosynthesis: Structure-aware Generation via Reaction-Center-Guided Discrete Flow Matching, by Chenguang Wang and 3 other authors View PDF HTML (experimental) Abstract:Template-free retrosynthesis methods treat the task as black-box sequence generation, limiting learning efficiency, while semi-template approaches rely on rigid reaction libraries that constrain generalization. We address this gap with a key insight: atom ordering in neural representations matters. Building on this insight, we propose a structure-aware template-free framework that encodes the two-stage nature of chemical reactions as a positional inductive bias. By placing reaction center atoms at the sequence head, our method transforms implicit chemical knowledge into explicit positional patterns that the model can readily capture. The proposed RetroDiT backbone, a graph transformer with rotary position embeddings, exploits this ordering to prioritize chemically critical regions. Combined with discrete flow matching, our approach decouples training from sampling and enables generation in 20--50 steps versus 500 for prior diffusion methods. Our method achieves state-of-the-art performance on both USPTO-50k...