Compilation of literature examples of generative drug (candidates) design that demonstrates experimental validation at least in vitro. Examples with also in vivo validation are specifically noted.
This compilation builds on our Review Paper and continues to compile literature examples for an up-to-date resource.
The review article is the result of an awesome collaboration with Yuanqi Du, Arian Jamasb, Tianfan Fu, Charlie Harris, Yingheng Wang, Chenru Duan, Pietro Liò, Philippe Schwaller, and Tom L. Blundell!
@article{du2024machine,
title={Machine learning-aided generative molecular design},
author={Du, Yuanqi and Jamasb, Arian R and Guo, Jeff and Fu, Tianfan and Harris, Charles and Wang, Yingheng and Duan, Chenru and Li{\`o}, Pietro and Schwaller, Philippe and Blundell, Tom L},
journal={Nature Machine Intelligence},
pages={1--16},
year={2024},
publisher={Nature Publishing Group UK London}
}
Please let me know if any examples are missing! 🙂
Fun fact (as of December 21, 2024): 33/58 examples are from 2024!
Every entry contains the following information:
- Publication Date - Paper Link
- Target - Design Task
- Model (Input: [Molecular Representation], Output: [Molecular Representation])
- Hit Rate (Number of synthesized examples with IC50 < 10µM or EC50 < 10µM) - NOTE: Designs that underwent manual domain-expert modifications are excluded
- Outcome (denoted nM if < 10 nM) - Most Potent Design (NOTE: Most potent without any domain-expert modifications. This is in contrast to our Review Paper which reports the final outcome)
- Notes (if applicable)
Examples are presented in chronological order based on the final paper publication date.
Many papers were first pre-printed on either ChemRxiv, BioRxiv, or ArXiv but for ease of organization, the final paper publication date is taken. The only exception is if the paper is still in pre-print stage which is the case for many goal-oriented generation examples because they are so recent (as of writing this statement in June 2024).
These examples pre-train on a dataset and/or fine-tune on a set of known actives. Molecules are then sampled from the fine-tuned model.
Publication Date: January 10, 2018 - Paper Link
Target: RXR - Design Task: De novo design
Model: LSTM RNN (Input: SMILES, Output: SMILES)
Hit Rate: 4/5 (80%)
Outcome: nM agonist - Most Potent Design: EC50 RXRγ = 60 ± 20 nM (N = 4 assay replicates)
2. Tuning artificial intelligence on the de novo design of natural-product-inspired retinoid X receptor modulators
Publication Date: October 22, 2018 - Paper Link
Target: RXR - Design Task: De novo design
Model: LSTM RNN (Input: SMILES, Output: SMILES)
Hit Rate: 2/4 (50%)
Outcome: µM agonist - Most Potent Design: EC50 RXRβ = 15.7 ± 0.8 µM (59 ± 5 SEM) (N = at least 2 assay replicates)
3. Discovery of Highly Potent, Selective, and Orally Efficacious p300/CBP Histone Acetyltransferases Inhibitors
Publication Date: January 7, 2020 - Paper Link
Target: p300/CBP histone acetyltransferases (HAT) - Design Task: De novo design
Model: LSTM RNN (Input: SMILES, Output: SMILES)
Hit Rate: 1/1 (100%)
Outcome Borderline nM inhibitor with in vivo validation - Most Potent Design: IC50 p300 = 10 nM
Notes: Only 1 generated molecule was synthesized. Further manual SAR resulted in a more potent design with in vivo validation
4. A Novel Scalarized Scaffold Hopping Algorithm with Graph-Based Variational Autoencoder for Discovery of JAK1 Inhibitors
Publication Date: August 24, 2021 - Paper Link
Target: JAK1 - Design Task: Scaffold hopping
Model: GraphGMVAE (Input: Graph, Output: SMILES)
Hit Rate: 7/7 (100%)
Outcome nM inhibitor - Most Potent Design: IC50 = 5.0 nM
Notes: The reference compound for scaffold hopping has an IC50 of 45 nM
Publication Date: June 11, 2021 - Paper Link
Target: LXR - Design Task: De novo design
Model: LSTM RNN (Input: SMILES, Output: SMILES)
Hit Rate: 17/25 (68%)
Outcome µM agonist - Most Potent Design: EC50 LXRα = 0.21 ± 0.02 µM (N = 3 assay replicates)
Notes: Used "microfluidics platform for on-chip chemical synthesis"
Publication Date: June 24, 2021 - Paper Link
Target: RORγ - Design Task: De novo design
Model: LSTM RNN (Input: SMILES, Output: SMILES)
Hit Rate: 3/3 (100%)
Outcome µM agonist - Most Potent Design: IC50 LXRα = 0.37 ± 0.05 µM (N = at least 4 assay replicates)
7. Discovery of Pyrazolo[3,4-d]pyridazinone Derivatives as Selective DDR1 Inhibitors via Deep Learning Based Design, Synthesis, and Biological Evaluation
Publication Date: January 13, 2022 - Paper Link
Target: DDR1 - Design Task: De novo scaffold-based decoration
Model: BiRNN encoder–decoder (Input: SMILES, Output: SMILES)
Hit Rate: 2/2 (100%)
Outcome: nM (borderline µM) inhibitor - Most Potent Design: IC50 = 10.2 ± 1.2 nM
Notes: The generated set was virtually screened and 2 compounds with the highest docking scores were synthesized. The authors further performed SAR studies.
Publication Date: March 30, 2022 - Paper Link
Target: MERTK - Design Task: Reaction based de novo design
Model: GRU RNN (Input: SMILES, Output: SMILES)
Hit Rate: 15/17 (100%)
Outcome: µM inhibitor - Most Potent Design: IC50 = 53.4 nM
Notes: RNN model generates building blocks compatible with selected reactions.
9. Recurrent neural network (RNN) model accelerates the development of antibacterial metronidazole derivatives
Publication Date: August 15, 2022 - Paper Link
Target: Bacteria - Design Task: De novo design
Model: GRU RNN (Input: SMILES, Output: SMILES)
Hit Rate: 0/1 (0%)
Outcome: µM inhibitor - Most Potent Design: IC50 S. aureus = 28.21 µM
Notes: 1 generated compound and 11 of its derivatives were synthesized. Within the 11 derivatives, 2 had IC50 < 10 μM. There were additional actives with a concrete potency measured > 10 μM.
10. Target-Focused Library Design by Pocket-Applied Computer Vision and Fragment Deep Generative Linking
Publication Date: October 18, 2022 - Paper Link
Target: CDK8 - Design Task: Fragment linking
Model: GGNN GNN (Input: Graph, Output: Graph)
Hit Rate: 9/43 (21%)
Outcome: nM inhibitor - Most Potent Design: IC50 = 6.4 nM (N = 3 assay replicates)
Notes: 2 rounds of generation. First round = 37 synthesized, second round = 6 synthesized. Second round takes the optimal inhibitor found from the first round and generates more linkers.
11. PCW-A1001, AI-assisted de novo design approach to design a selective inhibitor for FLT-3(D835Y) in acute myeloid leukemia
Publication Date: November 25, 2022 - Paper Link
Target: FLT-3 - Design Task: De novo design
Model: LSTM RNN (Input: SMILES, Output: SMILES)
Hit Rate: 1/1 (100%)
Outcome: µM inhibitor - Most Potent Design: IC50 = 764 nM
12. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design
Publication Date: January 7, 2023 - Paper Link
Target: PI3Kγ - Design Task: De novo design
Model: LSTM RNN (Input: SMILES, Output: SMILES)
Hit Rate: 3/18 (17%)
Outcome: µM inhibitor - Most Potent Design: Kd = 63 nM (N = 2 assay replicates)
Notes: 16 molecules (not the top scoring) were purchased from commercial suppliers resulting in a Kd = 640 nM hit. 2 top scoring compounds were manually synthesized and the most potent design had Kd = 63 nM. Derivatives of the top scoring generated compounds were also synthesized resulting in a compound with IC50 = 6.5 nM.
13. Application of deep generative model for design of Pyrrolo[2,3-d] pyrimidine derivatives as new selective TANK binding kinase 1 (TBK1) inhibitors
Publication Date: February 5, 2023 - Paper Link
Target: TBK1 - Design Task: Fragment linking
Model: Transformer (Input: SMILES, Output: SMILES)
Hit Rate: 1/1 (100%)
Outcome: µM inhibitor - Most Potent Design: IC50 = 66.7 nM (N = 2 assay replicates)
Notes: 1 generated molecule was synthesized with IC50 = 66.7 nM. Further SAR studies resulted in more potent designs.
14. Accelerated Discovery of Macrocyclic CDK2 Inhibitor QR-6401 by Generative Models and Structure-Based Drug Design
Publication Date: February 8, 2023 - Paper Link
Target: CDK2 - Design Task: Fragment hopping/linking
Model: VAE and transformer (Input: SMILES, Output: SMILES)
Hit Rate: 17/23 (74%)
Outcome: nM inhibitor with in vivo validation - Most Potent Design: IC50 CDK2/E1 = 0.37 nM
Notes: The hit rate is not completely accurate as the authors state some modifications were made on the generated structures for synthesis ease (it is unclear the extent of this). 13 compounds were initially synthesized. The crystal structure for one compound was solved and then a second generation campaign to generate linkers to form macrocycles was performed. The final optimal compound was validated in vivo.
15. De Novo Design of Nurr1 Agonists via Fragment-Augmented Generative Deep Learning in Low-Data Regime
Publication Date: May 31, 2023 - Paper Link
Target: Nurr1 - Design Task: De novo design
Model: LSTM RNN (Input: SMILES, Output: SMILES)
Hit Rate: 2/6 (33%)
Outcome: µM agonist - Most Potent Design: EC50 = 0.07 µM
Notes: The model was fine-tuned with 1 known Nurr1 agonist which has an EC50 = 0.4 µM.
Publication Date: April 22, 2024 - Paper Link
Target: PPARγ - Design Task: De novo ligand- and structure-based design
Model: Graph transformer-LSTM RNN (Input: Graph, Output: SMILES) - model is named DRAGONFLY
Hit Rate: 2/6 (33%)
Outcome: µM agonist - Most Potent Design: PPARγ EC50 = 1.5 ± 0.2 µM and PPARδ EC50 = 0.24 ± 0.05 µM
Notes: The model was fine-tuned with 1 known Nurr1 agonist which has an EC50 = 0.4 µM.
17. Combining de novo molecular design with semiempirical protein–ligand binding free energy calculation
Publication Date: November 20, 2024 - Paper Link
Target: AChE - Design Task: De novo ligand- and structure-based design
Model: Used DRAGONFLY (Graph transformer-LSTM RNN (Input: Graph, Output: SMILES)) which was previously developed by the authors
Hit Rate: 1/1 (100%) - 6-step convergent synthesis
Outcome: From the paper: "Specifically, compound 2 showed 31.6% (±0.8%) inhibition at 30 μM and 11% (±2%) inhibition at 10 μM" - Most Potent Design: 31.6% inhibition at 30 μM
Notes: Explored chemical space around Huperzine A (known AChE inhibitor). Tried SMILES and SELFIES - generated 4 molecular libraries and filtered with a scoring function notably encompassing a bioactivity prediction model and RAScore (AiZynthFinder retrosynthesis model surrogate). Top molecules were docked with GOLD (proprietary software) and xTB (open-source semiempirical quantum chemistry software).
Publication Date: December 20, 2024 - Pre-print Link
Target: DYRK1A for Alzheimer's disease - Design Task: De novo structure-based design
Model: Used Hierarchical Graph Encoder-Decoder (Input: Graph, Output: Graph)
Hit Rate: 1/1 (100%)
Outcome: μM inhibitor - Most Potent Design: 41 ± 3 nM (triplicate assays)
Notes: Trained an ensemble of QSAR models to property prediction. The generative model in total generated 5 batches of 10,000 molecules. After each generation cycle, the molecules were filtered with the QSAR models and similarity to known inhibitors. Molecules passing the filter were used for transfer learning on the model (by adding them to the initial training set and re-training). The best molecules (around 50) were docked (using Schrödinger Glide XP - proprietary software) and 1 selected for synthesis. 247 analogues of the synthesized molecule were also proposed and assessed by the predictive models - in the end, 7 were synthesized. Since these analogues were based on the selected molecule and not generated, they are not included in the hit rate here.
These examples either pre-train a conditional generator or pre-train and then couple an optimization algorithm for tailored molecular generation. This information is addionally noted.
Publication Date: March 23, 2018 - Paper Link
Target: Kinases - Design Task: De novo design
Model: Differentiable Neural Computer (DNC) (Input: SMILES, Output: SMILES)
Optimization Algorithm Class: Reinforcement Learning
Hit Rate: 0 (see Notes)
Outcome: µM agonist - Most Potent Design: N/A since no generated molecules were directly synthesized.
Notes: An in-house library was screened to identify high-Tanimoto-similarity molecules to the generated set. Therefore, none of the generated molecules were directly experimentally validated.
Publication Date: September 4, 2018 - Paper Link
Target: JAK3 - Design Task: De novo design
Model: Adversarial Autoencoder (AAE) (Input: SMILES, Output: SMILES)
Optimization Algorithm Class: Conditional generation (conditioned on binding affinity/activity against JAK3)
Hit Rate: 1/1
Outcome: µM inhibitor - Most Potent Design: IC50 = 6.73 µM
Publication Date: September 2, 2019 - Paper Link
Target: DDR1 - Design Task: De novo design
Model: VAE (Input: SMILES, Output: SMILES)
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 4/6
Outcome: nM inhibitor with in vivo validation - Most Potent Design: IC50 = 10 nM
Notes: Generated, synthesized, and performed in vitro and in vivo validation within 46 days.
4. Design and Synthesis of DDR1 Inhibitors with a Desired Pharmacophore Using Deep Generative Models
Publication Date: December 1, 2020 - Paper Link
Target: DDR1 - Design Task: De novo ligand-based design
Model: LSTM RNN (Input: SMILES, Output: SMILES) - Model is REINVENT
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 4/6
Outcome: µM inhibitor - Most Potent Design: IC50 = 92.5 nM
Notes: Pharmacophore matching approach.
5. Generative and reinforcement learning approaches for the automated de novo design of bioactive compounds
Publication Date: October 18, 2022 - Paper Link
Target: EGFR - Design Task: De novo design
Model: Stack-GRU RNN (Input: SMILES, Output: SMILES)
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 4/15
Outcome: µM inhibitor - Most Potent Design: IC50 = 210 nM
Publication Date: November 12, 2022 - Paper Link
Target: RIPK1 - Design Task: De novo design
Model: LSTM RNN (Input: SMILES, Output: SMILES)
Optimization Algorithm Class: Conditional generation
Hit Rate: 4/8
Outcome: µM inhibitor with in vivo validation - Most Potent Design: IC50 = 35.0 nM
Notes: The pre-trained model was fine-tuned via transfer learning and the generate set was virtually screened. This is an example of how generative design and virtual screening can be complementary.
7. AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor
Publication Date: January 10, 2023 - Paper Link
Target: CDK20 - Design Task: De novo structure-based design
Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 6/13
Outcome: µM inhibitor - Most Potent Design: IC50 CDK20/CycT1 = 33.4 ± 22.6 nM
Notes: First experimental validated example that used an AlphaFold structure for structure-based design. 2 rounds of generation. There were also additional actives with a concrete measured potency > 10 µM.
8. Discovery of Potent, Selective, and Orally Bioavailable Small-Molecule Inhibitors of CDK8 for the Treatment of Cancer
Publication Date: April 7, 2023 - Paper Link
Target: CDK8 - Design Task: De novo structure-based design
Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 1/1
Outcome: nM inhibitor with in vivo validation - Most Potent Design: IC50 = 0.4 ± 0.1 nM
Notes: The molecule was further optimized by manual domain-expert SAR ultimately resulting in in vivo vaidation.
Publication Date: August 9, 2023 - Paper Link
Target: KOR - Design Task: De novo structure-based design
Model: VAE (Input: SMILES, Output: SMILES)
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 2/5
Outcome: µM antagonist - Most Potent Design: Ki = 6.46 μM
10. Discovery of novel and selective SIK2 inhibitors by the application of AlphaFold structures and generative models
Publication Date: August 15, 2023 - Paper Link
Target: SIK2 - Design Task: De novo structure-based design (core scaffold was fixed)
Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 6/6
Outcome: µM inhibitor - Most Potent Design: IC50 = 0.023 μM
Notes: Used an AlphaFold structure for structure-based design.
11. Discovery of Novel and Potent Prolyl Hydroxylase Domain-Containing Protein (PHD) Inhibitors for The Treatment of Anemia
Publication Date: January 8, 2024 - Paper Link
Target: PHD enzymes - Design Task: De novo structure-based design
Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 1/1
Outcome: nM inhibitor with in vivo validation - Most Potent Design: IC50 PHD2 = 4 nM
Notes: Further SAR studies were performed by domain-experts, ultimately leading to in vivo validation.
Publication Date: January 23, 2024 - Paper Link
Target: NLRP3 - Design Task: De novo design with an activity model
Model: GRU RNN-transformer (Input: SMILES, Output: SMILES)
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 0 (see Notes)
Outcome: N/A - no generated molecules were directly synthesized and tested
Notes: 12 generated molecules were selected for docking and analysis of the binding poses short-listed two scaffolds. Derivatives were designed based on these two scaffolds, resulting in a nM inhibitor.
Publication Date: January 25, 2024 - Paper Link
Target: Neuraminidase (NA) of influenza A and B viruses - Design Task: De novo structure-based design
Model: GNN, specifically Attentive FP first described here (Input: Graph, Output: Graph)
Optimization Algorithm Class: Reinforcement learning (Q-learning)
Hit Rate: 2/9 (22%)
Outcome: μM inhibitor, antiviral activity with in vivo validation - Most Potent Design: Quoted from paper: "EC50 0.4 μM against A/St. Petersburg/63/2020, 0.29 μM against A/Vladivostok/2/2009, and 0.74 μM against B/Samara/32/2018 strains (Fig. 4A)"
Notes: in vivo validation was demonstrated.
Publication Date: February 13, 2024 - Pre-print Link
Target: KRAS - Design Task: De novo structure-based design
Model: Quantum Computer-LSTM RNN (Input: SMILES, Output: SMILES)
Optimization Algorithm Class: Classical optimizer
Hit Rate: 1/12 (8%)
Outcome: µM inhibitor - Most Potent Design: IC50 = 1.4 μM
Notes: First example of a quantum computer application with experimental validation. There were also additional actives with a concrete measured potency > 10 µM.
15. Generate What You Can Make: Achieving in-house synthesizability with readily available resources in de novo drug design
Publication Date: March 5, 2024 - Pre-print Link
Target: MGLL - Design Task: De novo design with an activity model
Model: Graph transformer (Input: Graph, Output: Graph)
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 1/3 (33%)
Outcome: µM inhibitor - Most Potent Design: IC50 = 1 μM
Notes: Generation using in-house collection of building blocks.
Publication Date: March 8, 2024 - Paper Link 1 - Paper Link 2 - Blog Post -
Target: TNIK - Design Task: De novo structure-based design
Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs
Optimization Algorithm Class: Reinforcement learning
Hit Rate: Unknown
Outcome: nM inhibitor with in vivo validation - Most Potent Design: IC50 = 4.8 nM
Notes: Initial generation led to a nM potent compound but with poor ADMET properties resulting in high clearance in human and mice liver microsomes. Lead optimization led to improved ADMET properties and ultimately in vivo validation.
Notably, Phase 1 clinical trial results were reported and this molecule is the first generative design to progress to phase 2 clinical trials.
Publication Date: March 11, 2024 - Paper Link
Target: Factor Xa - Design Task: Scaffold-based
Model: Attention-convolution layers (Input: Substructure vector, Output: SMILES)
Optimization Algorithm Class: Mixed-Integer NonLinear Programming from this Paper
Hit Rate: Unknown (see Notes)
Outcome: µM inhibitor - Most Potent Design: IC50 = 34.57 μM
Notes: 8 commercially available generated molecules were purchased. Only the most potent affinity was reported.
Publication Date: March 11, 2024 - Paper Link
Target: HAT1 and YTHDC1 - Design Task: De novo design
Model: Flow (Input: Geometry, Output: Geometry)
Optimization Algorithm Class: Conditional generation (conditioned on protein pocket)
Hit Rate: 0/2 (0%) and 0/3 (0%)
Outcome: µM inhibitors - Most Potent Design: For HAT1: IC50 = 72.36 ± 8.03 μM and for YTHDC1: IC50 = 32.60 ± 2.72 μM (N = 3 assay replicates)
Notes: There were also additional actives with a concrete measured potency > 10 µM.
19. Generative AI for designing and validating easily synthesizable and structurally novel antibiotics
Publication Date: March 22, 2024 - Paper Link
Target: Bacteria - Design Task: De novo design with an activity model
Model: Monte Carlo Tree Search (MCTS) (Input: Variable, Output: Variable). The activity model takes variable input/output.
Optimization Algorithm Class: Monte Carlo Tree Search (MCTS)
Hit Rate: 6/58 (10%) - 70 generated molecules were ordered from Enamine and 58 were successfully syntehsized with purity > 90% in ~4 weeks time.
Outcome: µM inhibitor with in vivo validation. 6 were bioactive against A. baumannii ATCC 19606R - Most Potent Designs: MIC ≤ 8 µg ml−1
Notes: Enforced chemical reactions as permitted transformations during generation.
20. Abstract 5727: ISM9682A, a novel and potent KIF18A inhibitor, shows robust antitumor effects against chromosomally unstable cancers
Publication Date: March 22, 2024 - Paper Link
Target: KIF18A - Design Task: De novo structure-based design
Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs
Optimization Algorithm Class: Reinforcement learning
Hit Rate: Unknown (see Notes)
Outcome: in vivo validation.
Notes: 110 molecules were synthesized and tested - Source.
21. A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets
Publication Date: March 26, 2024 - Paper Link
Target: CDK2 - Design Task: Lead optimization
Model: Diffusion (Input: Geometry, Output: Geometry)
Optimization Algorithm Class: Conditional generation (conditioned on protein pocket)
Hit Rate: 7/7
Outcome: nM inhibitor - Most Potent Design: IC50 CDK2/E1 = 0.090 nM
Notes: Original reference compound has an IC50 CDK2/E1 = 8.1 nM. 2 rounds of generation: 4 molecules synthesized from round 1 resulting in the most potent design IC50 CDK2/E1 = 0.253 nM. The second round of focused on intra-linking the molecules resulting in macrocycles. 3 were synthesized and the final most potent design IC50 CDK2/E1 = 0.090 nM.
Publication Date: April 1, 2024 - Paper Link
Target: Polθ - Design Task: Fragment linking
Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 4/6 (33%)
Outcome: µM inhibitor with in vivo validation - Most Potent Design: IC50 = 126.1 μM
Notes: Further SAR studies ultimately led to in vivo validation.
23. Quantum-assisted fragment-based automated structure generator (QFASG) for small molecule design: an in vitro study
Publication Date: April 3, 2024 - Paper Link
Target: CAMKK2 and ATM - Design Task: Fragment linking
Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 2/3 (66%) and 1/3 (33%)
Outcome: µM inhibitors - Most Potent Design: For CAMKK2: IC50 = 3 μM and for ATM: IC50 = 4 μM
24. Identification of SARS-CoV-2 Mpro inhibitors through deep reinforcement learning for de novo drug design and computational chemistry approaches
Publication Date: April 29, 2024 - Paper Link
Target: SARS-CoV-2 - Design Task: De novo design
Model: LSTM RNN (Input: SMILES, Output: SMILES) - Model is REINVENT
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 1/16 (6%)
Outcome: µM inhibitor - Most Potent Design: IC50 = 3.27 μM (Figure 4)
Notes: Combined both distribution learning and goal-directed generation. 17 molecules were ordered from Enamine REAL with 16/17 successfully synthesized and tested.
25. Discovery of a Novel and Potent Cyclin-Dependent Kinase 8/19 (CDK8/19) Inhibitor for the Treatment of Cancer
Publication Date: May 1, 2024 - Paper Link
Target: CDK8/19 - Design Task: De novo structure-based design with some fixed moieities (based on known binder)
Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs
Optimization Algorithm Class: Reinforcement learning
Hit Rate: N/A (see Notes)
Outcome: N/A (see Notes)
Notes: Compounds were manually designed based on generated molecules and 12 total compounds were synthesized. In the end, in vitro studies using murine CDX model for human mantle cell lymphoma showed IC50 = 1.34 nm. In vivo validation was achieved.
Publication Date: May 2, 2024 - Pre-print Link
Target: CDK8/19 - Design Task: De novo structure-based design with some fixed moieities (based on known binder)
Model: MPNN GNN (Input: Graph, Output: Graph)
Optimization Algorithm Class: Reinforcement learning (PPO)
Hit Rate: N/A (see Notes)
Outcome: N/A (see Notes)
Notes: 35 analogues of the generated molecules were synthesized. 23/35 have IC50 < 10 μM. The most potent design has IC50 = 0.43 μM.
27. NGT: Generative AI with Synthesizability Guarantees Identifies Potent Inhibitors for a G-protein Associated Melanocortin Receptor in a Tera-scale vHTS Screen
Publication Date: May 8, 2024 - Pre-print Link
Target: Melanocortin Type 2 Receptor (MC2R) - Design Task: De novo anagonist design using a surrogate model
Model: Combinatorial Synthesis Library Variational Auto-Encoder (CSLVAE) - first described here (Input: Graph, Output: Query vector to decode into molecule from library)
Optimization Algorithm Class: Reinforcement learning
Hit Rate: Among the 13/121 with > 50% inhibition at 30 µM, 5 were selected for further assays. 1/5 had EC50 < 10 µM. Therefore 1/121 (0.83%) had affinity < 10 µM
Outcome: µM antagonist - Most Potent Design: EC50 = 6.7 μM (Table S4)
Notes: This work is generative in a slightly different sense - the model decodes molecules from a dataset and can be seen as a combination of a generative and virtual screening method. 13/121 > 50% inhibition at 30 µM.
Publication Date: May 6, 2024 - Paper Link
Target: Dual specificity to MEK1 and mTOR - Design Task: De novo structure-based design with some fixed moieities (based on known binder)
Model: VAE (Input: SMILES, Output: SMILES)
Optimization Algorithm Class: Reinforcement learning - Hill-Climbing
Hit Rate: 19/32 (59%)
Outcome: µM inhibitor - Most Potent Design: IC50 between 1-10 μM (Fig. 6d)
Notes: The 4 most potent compounds achieved > 50% reduction in phosphorlation activity of both MEK1 and mTOR at 1 μM.
29. Synthetically Feasible De Novo Molecular Design of Leads Based on a Reinforcement Learning Model: AI-Assisted Discovery of an Anti-IBD Lead Targeting CXCR4
Publication Date: June 12, 2024 - Paper Link
Target: CXCR4 - Design Task: De novo structure-based antagonist design
Model: MLP (Input: SMILES, Output: SMILES)
Optimization Algorithm Class: Reinforcement learning (SAC)
Hit Rate: 20/20 (100%)
Outcome: nM competitive antagonism with in vivo validation - Most Potent Design: Antagonistic rate = 78.9 ± 6.2% at 10 nm (N = 3 assay replicates) - Table 1
Notes: Uses commercially available building blocks and molecular generation follows reaction templates. Uses AutoDock Vina as the docking protocol which is open-source. In vivo validation.
30. Accelerated Discovery of Carbamate Cbl-b Inhibitors Using Generative AI Models and Structure-Based Drug Design
Publication Date: August 12, 2024 - Paper Link
Target: Cbl-b - Design Task: De novo scaffold-based design
Model: LSTM RNN (Input: SMILES, Output: SMILES) - Model is LibINVENT which is part of REINVENT
Optimization Algorithm Class: Reinforcement learning
Hit Rate: N/A (see notes)
Outcome: N/A (see notes)
Notes: LibINVENT designed 2 molecules which were of interest after FEP validation. Small modifications of these 2 compounds were made and then synthesized. Both were active and the most potent had IC50 1.2 μM. A third compound was the result during chiral separation of one of the two synthesized compounds. This third compound was also tested with IC50 37 μM. The insights from these first three compounds inspired the remaining design campaign.
31. Discovery of novel quinoline papain-like protease inhibitors for COVID-19 through topology constrained molecular generative model
Publication Date: September 13, 2024 - Pre-print Link
Target: Papain-like protease (PLpro) - Design Task: Scaffold hopping
Model: GNN with GCN and GGNN blocks (Input: 2D Graph, Output: 2D Graph) - Model is Tree-Invent and generation is autoregressive
Optimization Algorithm Class: Reinforcement learning (using REINVENT's loss function)
Hit Rate: 9/9
Outcome: µM inhibitor - Most Potent Design: IC50 PLpro = 0.0238 µM (Fig. 3b)
Notes: Based on the most potent Tree-Invent molecule (molecule 2 in the paper), a virtual screening library was created with commercial reagents. This library was screened using Glide docking and led to an experimentally validated nM potent compound. In vivo validation was achieved.
32. AutoDesigner - Core Design, a De Novo Design Algorithm for Chemical Scaffolds: Application to the Design and Synthesis of Novel Selective Wee1 Inhibitors
Publication Date: October 3, 2024 - Paper Link
Target: Wee1 with improved selectivity against PLK1 - Design Task: De novo scaffold-based design
Model: Enumeration
Optimization Algorithm Class: Filtering by property values
Hit Rate: 3/3 (100%)
Outcome: µM inhibitor but with selectivity against PLK1 - Most Potent Design: IC50 Wee1 = 58.3 nM and IC50 PLK1 > 10,000 μM (Table 4)
Notes: AutoDesigner is generative in a slightly different sense, in that it takes libraries of chemical moeities and attaches them, akin to enumeration. Uses relative free energy perturbation from Schrödinger (FEP+) combined with active learning.
33. Modern hit-finding with structure-guided de novo design: identification of novel nanomolar A2A receptor ligands using reinforcement learning
Publication Date: October 14, 2024 - Pre-print Link
Target: A2A Receptor Antagonist Design - Design Task: De novo structure-based design
Model: GRU RNN (Input: SMILES, Output: SMILES) - Model is Augmented Hill Climbing (AHC)
which is a modification of REINVENT
that adds Hill-climbing by backpropagating only on the top 50% best molecules (by reward) per sampled batch
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 8/10 have pKi < 10 μM
Outcome: μM antagonist and selective against A2B
Notes: Used Glide as the docking software which is proprietary. Through Glide, hydrogen-bond constraints were enforced. For some protein targets, an additional occupancy constraint was enforced. Docking was performed against 7 known A2A structures. Oracle budget was 12,800 which is amongst the most constrained in case studies with experimental validation. 2 co-crystal structures obtained for the most potent ligands.
Publication Date: March 15, 2024 - Pre-paper Link, October 16, 2024 - Paper Link
Target: LTK - Design Task: De novo design - fragment-based assembly
Model: GAT GNN (Input: Geometry, Output: Geometry)
Optimization Algorithm Class: Conditional generation (conditioned on protein pocket)
Hit Rate: 3/3 (100%)
Outcome: µM inhibitor - Most Potent Design: IC50 = 75.4 nM
Publication Date: October 29, 2024 - Paper Link - February 1, 2024 - Pre-print Link
Target: Tuberculosis ClpP - Design Task: De novo design
Model: Transformer-VAE (Input: Geometry and SMILES, Output: SMILES)
Optimization Algorithm Class: Conditional generation
Hit Rate: 0/1 (0%)
Outcome: µM inhibitor - Most Potent Design: IC50 = 20.3 μM
Notes: Commercially available analogues were tested and were µM potent (IC50). Only 1 generated molecule was directly synthesized.
36. ClickGen: Directed exploration of synthesizable chemical space via modular reactions and reinforcement learning
Publication Date: November 22, 2024 - Paper Link
Target: PARP1 - Design Task: De novo design
Model: Based on U-Net encoder-decoder architecture (Input: SMILES, Output: SMILES)
Optimization Algorithm Class: Reinforcement Learning and Monte Carlo Tree Search (MCTS)
Hit Rate: 2/3 (66%) - 1/3 has a reported IC50 of > 1,000 nM and a concrete measurement was not provided (see Fig. 8)
Outcome: µM inhibitor - Most Potent Designs: IC50 = 19.24 ± 1.63 nM
Notes: Used Schrödinger's computational chemistry software (proprietary) - pharmacophore matching and Glide docking.
37. Discovery of Pyridine-2-Carboxamides Derivatives as Potent and Selective HPK1 Inhibitors for the Treatment of Cancer
Publication Date: November 25, 2024 - Paper Link
Target: Hematopoietic progenitor kinase 1 (HPK1) - Design Task: Scaffold hopping
Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs
Optimization Algorithm Class: Reinforcement learning
Hit Rate: 1/1 (100%)
Outcome nM inhibitor - Most Potent Design: IC50 = 0.64 nM but moderate selectivity
Notes: Chemistry42 generated 1 molecule which performs a scaffold hop on the previously reported A-745 from AbbVie. Further SAR optimization led to in vivo validation.
Extended Notes: There is a series of 3 papers on HPK1 from Insilico Medicine, based on the following Press Release:
1: Target Validation using PandaOmics
3: Generative Design with further SAR
38. Intestinal mucosal barrier repair and immune regulation with an AI-developed gut-restricted PHD inhibitor
Publication Date: December 11, 2024 - Paper Link
Target: Hypoxia-inducible factor prolyl hydroxylase (PHD): Both PHD1 and PHD2 - Design Task: De novo structure-based design by fragment growth of privileged fragment
Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs
Optimization Algorithm Class: Reinforcement learning
Hit Rate: Unclear, the paper states 1 compound was generated and synthesized which was then further SAR optimized. According to this source, in total, approximately 115 molecules were synthesized but perhaps some were not directly generated.
Outcome nM inhibitor with in vivo validation - Most Potent Design: IC50 = 4 nM
Notes: The IC50 4 nM compound was generated and then further SAR optimization led to ISM012-042 (improved ADMET). In vivo validation was achieved.
39. Generative deep learning enables the discovery of phosphorylation-suppressed STAT3 inhibitors for non-small cell lung cancer therapy
Publication Date: - Paper Link
Target: STAT3 - Design Task: De novo design
Model: LSTM RNN (Input: SMILES, Output: SMILES) - same model as used here
Optimization Algorithm Class: Conditional generation
Hit Rate: Unclear - paper states 90 generated molecules were selected for synthesis with 2 possessing potent inhibitory activity at 1 μM
Outcome: From the paper: "The results demonstrated that HG106 and HG110 significantly suppressed colony formation in all tested NSCLC cell lines at a concentration of 1 μM Fig.4A."
Notes: Conditional generation resulted in a library of 15,678 generated molecules. Similar to the previous paper where the model was adapted from, the generated library was screened. Oracles include physico-chemical properties, docking (AutoDock 4.0), and MMGBSA.
40. Electron-density informed effective and reliable de novo molecular design and lead optimization with ED2Mol
Publication Date: - Pre-print Link
Target: 3 Targets: FGFR3 orthosteric inhibitors, CDC42 allosteric inhibitors, and GCK allosteric activator - Design Task: De novo design and lead optimization
Model: Uses a VAE and an equivariant GNN (EGNN) (Input: Graph/Geometry, Output: Graph/Geometry) - new model proposed in the paper is named ED2Mol
Optimization Algorithm Class: Conditional generation
Hit Rate/Outcome:
FGFR3 Orthosteric Inhibitors
: 50,000 molecules generated, filtered, and clustered. 5 molecules selected (unclear how many synthesized) -
CDC42 Allosteric Inhibitors
: 50,000 molecules generated, filtered, and clustered. 2 molecules selected for synthesis - IC50 = 47.58 ± 3.71 μM and 111.63 ± 0.90 μM.
GCK Allosteric Activator
: Based on a known compound, 10,000 molecules were generated, filtered, and clustered for lead optimization. 2 molecules selected for synthesis - EC50 = 290 nM and 150 nM. Original compound has an EC50 = 1810 nM.
Notes: One of the relatively few works using electron density to inform the generation of molecules.