[2602.12162] Amortized Molecular Optimization via Group Relative Policy Optimization
Summary
The paper presents GRXForm, a novel approach for molecular optimization using Group Relative Policy Optimization, addressing the limitations of existing instance optimizers by enhancing generalization across diverse molecular structures.
Why It Matters
This research is significant as it tackles the inefficiencies in molecular design processes, particularly in optimizing structures without extensive computational resources. By improving generalization and reducing variance in optimization tasks, it has implications for drug discovery and materials science, potentially accelerating innovation in these fields.
Key Takeaways
- GRXForm improves molecular optimization by using a pre-trained Graph Transformer model.
- Group Relative Policy Optimization mitigates high variance in optimization tasks.
- The approach generalizes well to out-of-distribution molecular scaffolds.
- Achieves competitive results in multi-objective optimization compared to leading methods.
- Reduces the need for inference-time oracle calls, enhancing efficiency.
Computer Science > Machine Learning arXiv:2602.12162 (cs) [Submitted on 12 Feb 2026 (v1), last revised 19 Feb 2026 (this version, v2)] Title:Amortized Molecular Optimization via Group Relative Policy Optimization Authors:Muhammad bin Javaid, Hasham Hussain, Ashima Khanna, Berke Kisin, Jonathan Pirnay, Alexander Mitsos, Dominik G. Grimm, Martin Grohe View a PDF of the paper titled Amortized Molecular Optimization via Group Relative Policy Optimization, by Muhammad bin Javaid and 7 other authors View PDF Abstract:Molecular design encompasses tasks ranging from de-novo design to structural alteration of given molecules or fragments. For the latter, state-of-the-art methods predominantly function as "Instance Optimizers'', expending significant compute restarting the search for every input structure. While model-based approaches theoretically offer amortized efficiency by learning a policy transferable to unseen structures, existing methods struggle to generalize. We identify a key failure mode: the high variance arising from the heterogeneous difficulty of distinct starting structures. To address this, we introduce GRXForm, adapting a pre-trained Graph Transformer model that optimizes molecules via sequential atom-and-bond additions. We employ Group Relative Policy Optimization (GRPO) for goal-directed fine-tuning to mitigate variance by normalizing rewards relative to the starting structure. Empirically, GRXForm generalizes to out-of-distribution molecular scaffolds without ...