[2602.13419] Protect$^*$: Steerable Retrosynthesis through Neuro-Symbolic State Encoding
Summary
The paper introduces Protect$^*$, a neuro-symbolic framework that enhances retrosynthesis by integrating Large Language Models with chemical logic to ensure reliable synthetic pathways.
Why It Matters
This research addresses a significant challenge in synthetic chemistry by providing a method to guide LLMs in avoiding chemically sensitive sites. It enhances the reliability of automated retrosynthesis, which is crucial for drug discovery and materials science, making it relevant for both AI and chemistry communities.
Key Takeaways
- Protect$^*$ combines LLMs with rule-based chemical logic for improved retrosynthesis.
- The framework offers both automatic and human-in-the-loop modes for flexibility.
- Active state tracking ensures that reactive sites are protected during synthesis.
- Case studies demonstrate the framework's effectiveness in discovering synthetic pathways.
- Grounding neural generation in symbolic logic enhances reliability and autonomy.
Quantitative Biology > Quantitative Methods arXiv:2602.13419 (q-bio) [Submitted on 13 Feb 2026] Title:Protect$^*$: Steerable Retrosynthesis through Neuro-Symbolic State Encoding Authors:Shreyas Vinaya Sathyanarayana, Shah Rahil Kirankumar, Sharanabasava D. Hiremath, Bharath Ramsundar View a PDF of the paper titled Protect$^*$: Steerable Retrosynthesis through Neuro-Symbolic State Encoding, by Shreyas Vinaya Sathyanarayana and 3 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) have shown remarkable potential in scientific domains like retrosynthesis; yet, they often lack the fine-grained control necessary to navigate complex problem spaces without error. A critical challenge is directing an LLM to avoid specific, chemically sensitive sites on a molecule - a task where unconstrained generation can lead to invalid or undesirable synthetic pathways. In this work, we introduce Protect$^*$, a neuro-symbolic framework that grounds the generative capabilities of Large Language Models (LLMs) in rigorous chemical logic. Our approach combines automated rule-based reasoning - using a comprehensive database of 55+ SMARTS patterns and 40+ characterized protecting groups - with the generative intuition of neural models. The system operates via a hybrid architecture: an ``automatic mode'' where symbolic logic deterministically identifies and guards reactive sites, and a ``human-in-the-loop mode'' that integrates expert strategic constraints. Through ``activ...