SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence
About this article
A Blog post by SandboxAQ on Hugging Face
Back to Articles SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence Team Article Published September 2, 2025 Upvote 36 +30 Arman Zaribafiyan arman-zaribafiyan Follow SandboxAQ Georgia Channing cgeorgiaw Follow SandboxAQ Rudi Plesch rudiplesch-sbx Follow SandboxAQ Zane Beckwith zanebeckwithsaq Follow SandboxAQ This summer, SandboxAQ released the Structurally Augmented IC50 Repository (SAIR), the largest dataset of co-folded 3D protein-ligand structures paired with experimentally measured IC₅₀ labels, directly linking molecular structure to drug potency and overcoming a longstanding scarcity in training data. This dataset is now available on Hugging Face, and for the first time, researchers have open access to more than 5 million AI‑generated, high‑accuracy protein-ligand 3D structures, each paired with validated empirical binding potency data. SAIR is an open-sourced dataset and is publicly available for free under a permissive CC BY 4.0 license, making it immediately actionable for commercial and non-commercial R&D pipelines. More than just a dataset, SAIR is a strategic asset that bridges the long-standing data gap in AI-powered drug design. It empowers pharmaceutical, biotech, and tech‑bio leaders to accelerate R&D, expand target horizons, and supercharge AI models – moving more of the costly, lengthy drug design and optimization from the wet lab to in silico. This means shorter hit‑to‑lead timelines, more efficient lead optimization, fewer dead‑end pr...