2. Table of Contents
• Introduction
• Retrosynthesis prediction
• Dataset description
• Overview of general approaches: Template-based, Template-free, Selection-based
• Proposed methods
• Classical computer-aided methods
• Machine learning based methods
• Challenges
• Practice
• RDKit
• OpenNMT
• Related works
• Future directions
• Reference
• Appendix
3. Table of Contents
• Introduction
• Retrosynthesis prediction
• Dataset description
• Overview of general approaches: Template-based, Template-free, Selection-based
• Proposed methods
• Classical computer-aided methods
• Machine learning based methods
• Challenges
• Practice
• RDKit
• OpenNMT
• Related works
• Future directions
• Reference
• Appendix
4. Retrosynthesis prediction
• What is retrosynthesis prediction?
• Retrosynthesis or retrosynthetic pathway planning is the process of tracing back the
forward reaction, predicting which reactants are required to synthesize the target product.
4
5. Retrosynthesis prediction
• Retrosynthesis is crucial process of discovering new materials and drugs.
5
Desired
properties
Candidate
Product
Candidate
Reactants Test by chemist
Retrosynthesis prediction
6. • Each process of discovering new materials and drug has own error, it should be
verified by chemist.
• Expensive
6
Desired
properties
Candidate
Product
Candidate
Reactants Test by chemist
Retrosynthesis prediction
Retrosynthesis prediction
8. Retrosynthesis prediction
• If retrosynthesis prediction can be done with high accuracy …
• Capable of unlocking future possibilities of a fully automated material/drug discovery
pipeline.
8
Desired
properties
Candidate
Product
Candidate
Reactants
Test by robot
Retrosynthesis prediction
9. Dataset description
• SMILES (Simplified Molecular-Input Line-Entry System) [1]
• SMILES is a specification in the form of a line notation for describing the structure of
chemical species [2].
• Generation of SMILES.
• By printing symbol nodes encountered in a depth-first tree traversal of a chemical graph
9[1] Weininger et al .[2] https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
10. Dataset description
• SMILES in detail
• Character of carbon(C) is omitted in the graph.
• Hydrogen(H) is omitted in the SMILES.
• Ring structures are written by breaking each ring at an arbitrary point to make an acyclic str
ucture and adding numerical ring closure labels to show connectivity between non-adjacen
t atoms.
• Branches are described with parentheses.
• A bond is represented using one of the symbols: ., -, =, #, $, :, /,
• “.” indicates two parts are not bonded together
10[1] Weininger et al .[2] https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
11. Dataset description
• Benchmark:
1. USPTO (United States Patent and Trademark Office)
• USPTO benchmark contains SMIELS representation of single target product (input) and
reactants (target)
• Variants
• USPTO-50k
• USTPO-500K
• USPTO-MIT
2. Pistachio [32]
3. Reaxys [25]
11[25] reaxys.com [32] Mayfield et al.
12. Overview of general approaches: Template-based
• Template-based approaches [2, 3, 4, 5, 14, 15, 16, 17] use the known chemical
reaction which is called reaction template.
• Reaction template contains sub-graph reaction patterns that describing how the reaction
occur between reactants and product.
• Pros
• High interpretability
• Cons
• Low generalizability to unseen templates
• Require domain knowledge to extract the reaction templates
12
13. Overview of general approaches: Template-free
• Template-free approaches [6, 7, 8, 9, 10, 12] learn mapping function product to a set of
reactants by extracting features directly from data.
• Seq2Seq framework
• [6, 7, 8, 12]
• Graph2Grpah framework
• [9, 10]
• Pros
• Generalizability
• Not require domain knowledge
• Cons
• Invalid/Inaccessible predictions
• Low interpretability
13
f
14. Overview of general approaches: Selection-based
• Selection-based approaches [11] select a candidate set of purchasable reactants.
• The objective of [11] is to discover retrosynthetic routes from a given desired product to co
mmercially available reactants
• Pros
• Accessibility of the prediction
• Not require domain knowledge
• Cons
• Novelty
14[11] Guo et al.
Rank := f(product; )
Purchasable pool
15. Table of Contents
• Introduction
• Retrosynthesis prediction
• Dataset description
• Overview of general approaches: Template-based, Template-free, Selection-based
• Proposed methods
• Classical computer-aided methods
• Machine learning based methods
• Challenges
• Practice
• RDKit
• OpenNMT
• Related works
• Future directions
• Reference
• Appendix
16. Classical computer-aided methods
• Before deep learning, computer-aided retrosynthesis were mainly conducted using
reaction template. [2, 3, 4, 15, 16, 17]
• They are mainly about how to use known reactions and extract meaningful reaction
context.
• Characteristics
• It needs chemical expertise.
• Heuristics
• Computationally expensive
• Chemical space is vast
• Subgraph isomorphism problem*1.
• Not scalable
• Not generalizable
16*1: Appendix-1
17. Classical computer-aided methods
• The first computer-aided retrosynthesis:
• [18] Corey et al., “Computer-assisted analysis in organic synthesis.”, Science, 1985
• The author won the Nobel Prize in Chemistry for his contribution of retrosynthetic analysis.
• [19] The Logic of Chemical Synthesis: Multistep Synthesis of Complex Carbogenic Mol
ecules (Nobel lecture), 1991
17[18, 19] Corey et al.
19. • Key Idea
• It uses product similarity and reactants similarity to rank template of precedent reactions.
19[3] Coley et al.
Classical computer-aided methods:
Recent work [3] 2017 – Key Idea
20. • How to measure molecular similarity*2?
• Molecular fingerprints are a way of encoding the structure of molecule. We can use RDKit
library to get it.
• Most common way is Tanimoto similarity, but there is no canonical definition of molecule
similarity (subgraph isomorphism problem*1).
• , : Molecular fingerprint
20*1: Appendix-1, *2: Appendix-2
Img from [20]
Classical computer-aided methods:
Recent work [3] 2017 – Method (Similarity)
21. • Example of using similarity in [3]
• Total similarity := Product Sim * Reactants (Precursor) sim
21[3] Coley et al.
Rank
Classical computer-aided methods:
Recent work [3] 2017 – Method (Using similarity)
22. • Result of [3]
• [3] performs better than seq2seq. However, the seq2seq in table is template-free and [3] is
template-based.
• Contribution
• It mimics the retrosynthetic strategy by using molecular similarity without need to encode
any chemical knowledge.
• Limitation
• It inherently disfavors making creative retrosynthetic strategy because it relies on
precedent reactions.
22*3: Appendix-3
*3
Classical computer-aided methods:
Recent work [3] 2017 - Results
23. Table of Contents
• Introduction
• Retrosynthesis prediction
• Dataset description
• Overview of general approaches: Template-based, Template-free, Selection-based
• Proposed methods
• Classical computer-aided methods
• Machine learning based methods
• Challenges
• Practice
• RDKit
• Open NMT
• Related works
• Future directions
• Reference
• Appendix
• Library
• Related works
24. Machine learning based methods
• Data-driven methods using machine learning and deep learning have been activated
since mid-2010s.
• The need for expertise has been reduced.
• More scalable and generalizable.
• Representative proposed methods
• Template-based
• NeuralSim [14], Graph Logic Network (GLN) [5]
• Template-free
• Seq2Seq [21], Molecular Transformer (MT) [6, 7], Latent variable Transformer (LV-MT)
[8], Self-Corrected Transformer (SCROP) [22], Graph2Graph (G2G) [9], GraphRetro [10]
• Selection-based
• Bayesian-Retro [11]
24
26. • Template-based: NeuralSim [14] (2017)
• Key Idea
• Given a target product, it uses neural network to predict most suitable rule in reaction
template.
26[14] Segler et al.
Machine learning based methods
Template-based: NeuralSim [14] 2017 – Key Idea
27. • Template-based: NeuralSim [14]
• It uses primitive models such as MLP and Highway network [23].
• It defines rule-selection as a multiclass classification.
• Molecular Descriptor [24] is defined as sum of molecular fingerprint:
27[14] Segler et al. [23] Srivastava et al. [24] pdf file
Machine learning based methods
Template-based: NeuralSim [14] 2017 - Method
28. • Template-based: NeuralSim [14]
• Experiments
• Dataset: Reaxys database [25]
• # of class: 8720
• Contribution
• It shows neural networks can learn to which molecular context particular rules can be applied.
• Limitation
• The performance is affected by rule set cardinality.
• The larger the set size, the lower the performance.
28[14] Segler et al.
Machine learning based methods
Template-based: NeuralSim [14] 2017 - Results
29. • Template-based: Graph Logic Network (GLN) [5] (NeurIPS 2019)
29[5] Dai et al.
Machine learning based methods
Template-based: Graph Logic Network [5] 2019
30. • Key Idea
• Modeling the joint distribution of reaction templates and reactants using logic variable.
• It learns when rules from reaction templates should be applied.
30[5] Dai et al.
Machine learning based methods
Template-based: Graph Logic Network [5] 2019 – Key Idea
31. • Retrosynthesis Template
• Using the retrosynthesis template can be decomposed into 2-step logic.
• Match template
• Match reactants
31[5] Dai et al.
Machine learning based methods
Template-based: Graph Logic Network [5] 2019 - Background
32. • Match template
• Match reactants
• Uncertainty
• Template score function
• Reactants score function
32[5] Dai et al.
Machine learning based methods
Template-based: Graph Logic Network [5] 2019 - Method
33. • Final joint probability
33[5] Dai et al. *4: Appendix-4
Machine learning based methods
Template-based: Graph Logic Network [5] 2019 - Method
Parameterizing by GNN (Graph Neural Network)*4
34. • MLE with Efficient Inference
• Gradient approximation
34
Machine learning based methods
Template-based: Graph Logic Network [5] 2019 - Method
[5] Dai et al.
35. • Top-k results
• Contribution
• Interpretability: Integration of probabilistic models and template(chemical rule)
• Limitation
• It share limitations of template-based method
• Scalability
35[5] Dai et al.
Machine learning based methods
Template-based: Graph Logic Network [5] 2019 - Results
36. 36[21] Liu et al.
Machine learning based methods
Template-free: Seq2Seq [21] 2017
37. • Template-free: Seq2Seq [21] (2017)
• It tokenizes SMILES and treats retrosynthesis as machine translation.
• It uses bidirectional LSTM for a encoder and decoder.
• It uses beam search to produce a set of reactants.
37[21] Liu et al.
Machine learning based methods
Template-free: Seq2Seq [21] 2017 - Method
38. • Results
• It performs comparably to the rule-based expert system baseline.
• Contribution
• It shows fully data-driven seq2seq model can learn retrosynthetic pathway.
• Limitations
• It produces grammatically invalid SMILES and chemically implausible predictions.
• Just naïve application of seq2seq model.
• Predictions generated by a vanilla seq2seq model with beam search typically exemplifies
low diversity with only minor differences in the suffix. [8]
38[21] Liu et al, [8] Chen et al
Machine learning based methods
Template-free: Seq2Seq [21] 2017 – Results
39. • Grammatically invalid SMILES
• Grammatically valid but chemically implausible
39[21] Liu et al.
Machine learning based methods
Template-free: Seq2Seq [21] 2017 – Results
40. 40[6] Schwaller et al., [7] Lee et al.
Machine learning based methods
Template-free: Molecular Transformer [6, 7] 2019
41. • Key Idea
• It also tokenizes SMILES and treats retrosynthesis as machine translation like [21].
• It uses Transformer instead of LSTM
• It performs better than seq2seq [21] but has same limitations.
41
Machine learning based methods
Template-free: Molecular Transformer [6, 7] 2019 – Key Idea
[6] Schwaller et al., [7] Lee et al. [21] Liu et al.
43. • It extends Molecular Transformer (MT) to become more generalizable to rare
reactions and produce diverse path.
• Key Idea
• It proposes novel pretrain method.
• Random bond cut
• Template-based bond cut
• It trains a mixture model with the online hard-EM algorithm.
43[8] Chen et al
Machine learning based methods
Template-free: LV-MT [8] 2019 – Key Idea
44. • Pretrain methods
• Random bond cut
• For each input target product, it generates new examples by selecting a random
bond to break.
• Template-based bond cut
• Instead of randomly breaking bonds, it uses the templates to break bonds.
• The model is pre-trained on these auxiliary examples, and then used as initialization
to be fine-tuned on the actual retrosynthesis data.
44
Machine learning based methods
Template-free: LV-MT [8] 2019 – Method (Pretrain)
[8] Chen et al
45. • Why latent variables are introduced?
• It tackles the problem of generating diverse predictions.
• The outputs of beam search tend to be similar to each other.
• Given a target SMILES string x and reactants SMILES string y, a mixture model
introduces a multinomial latent variable z ∈ { 1, · · · , K } to capture different reaction
types, and decomposes the marginal likelihood as:
45
Machine learning based methods
Template-free: LV-MT [8] 2019 – Method (Latent Var.)
[8] Chen et al
46. • Hard-EM algorithm
1. Taking a mini-batch of training examples
2. It enumerates all K values of z and compute their loss,
• Dropout should be turned off [26].
3. For each , it selects the value of z that yields the minimum loss:
• For p(y | z, x; θ), it shares the encoder-decoder network among mixture components, and
feed the embedding of z as an input to the decoder so that y is conditioned on it
4. Back-propagate through it, so only one component receives gradients per example.
• Dropout should be turned back on [26].
46[8] Chen et al., [26] Shen et al.
Machine learning based methods
Template-free: LV-MT [8] 2019 – Method (Latent Var.)
47. • Results*5
47*5: We report better hyper-parameters and the results in Appendix-5
Machine learning based methods
Template-free: LV-MT [8] 2019 – Results
48. • Contributions
• It proposes novel pretraining methods for retrosynthesis.
• It uses mixture model Transformer for diverse predictions.
• Limitations
• The more latent variables are used, the worse the top 1 performance.
• The latent variable does not appear to contain information about the reaction class.
48
Machine learning based methods
Template-free: LV-MT [8] 2019 – Results
[8] Chen et al
49. • Template-free: Self-Corrected Transformer (SCROP) [22] (2020)
49[22] Zheng et al.
Machine learning based methods
Template-free: SCROP [22] 2020
50. • Template-free: Self-Corrected Transformer (SCROP) [22] (2020)
• Key Idea
• It uses Transformer for correcting invalid predicted SMILES
• It makes syntax correction data via trained Transformer by constructing set of invalid
prediction-ground truth pairs.
• It trains another Transformer for syntax corrector using syntax correction data.
• At test time, it retains the top-1 candidate produced by the syntax corrector and
replace the original one.
50[22] Zheng et al.
Machine learning based methods
Template-free: SCROP [22] 2020 – Key Idea
51. • Results
• Compare to Transformer (SCROP-noSC), the performance is improved by 0.4~1.7%.
51
Machine learning based methods
Template-free: SCROP [22] 2020 – Results
[22] Zheng et al.
52. • Invalid SMILES rates
• Limitations
• Why SCROP? We can remove invalid SMILES by using RDKit without learned model.
52[22] Zheng et al.
Machine learning based methods
Template-free: SCROP [22] 2020 – Results
53. • Template-free: Graph2Graph (G2G) [9] (ICML 2020)
53[9] Shi et al.
Machine learning based methods
Template-free: G2G [9] 2020
54. • Key Idea
• It decomposes retrosynthesis as 2-step procedure:
• Breaking target product
• Transforming broken target product
• It trains Reaction Center Identification (RCI) module for making synthon(s) via breaking bonds in a
product graph.
• It trains Variational Graph Translation module for making reactants via a series of graph
transformation.
54
Machine learning based methods
Template-free: G2G [9] 2020 – Key Idea
[9] Shi et al.
55. • Reaction Center Identification (RCI)
• It uses a R-GCN [27] for learning graph representation.
• Overview
1. Given a chemical reaction , it derives a binary label matrix
2. Computing node embeddings and graph embedding.
3. To estimate the reactivity score of atom pair (i,j), the edge embedding is formed by
concatenating several features.
4. The final reactivity score of the atom pair (i, j) is calculated as:
5. The RCI is optimized by maximizing the cross entropy of the binary label
55
Machine learning based methods
Template-free: G2G [9] 2020 – Method (RCI)
[9] Shi et al. [27] Schlichtkrull et al.
56. • Reactants generation via Variational Graph Translation (VGT).
1. It receives synthons from the RCI and transform the synthons to reactants.
2. It generates a sequence of graph transformation actions , and apply them on
the initial synthon graph.
• It assumes graph generation as a Markov Decision Process (MDP).
56
Machine learning based methods
Template-free: G2G [9] 2020 – Method (VGT)
[9] Shi et al.
57. • Reactants generation via Variational Graph Translation (VGT).
• Overview
1. Let transformation trajectory := , the graph transformation is
deterministic if the transformation trajectory is defined.
=
2. Let denote the graph after applying the sequence of actions to
3. Leveraging assumption of a MDP,
=
4. Finally, Graph transformation cab be factorized as follows:
57
Machine learning based methods
Template-free: G2G [9] 2020 – Method (VGT)
[9] Shi et al.
58. • Reactants generation via Variational Graph Translation (VGT).
• Overview (cont’d)
4. Let an action is a tuple
5. It decomposes the distribution into 3 parts:
i. Termination prediction
ii. Nodes selection
iii. Edge labeling
6. It uses variational inference by introducing an approximate posterior
58[9] Shi et al.
Machine learning based methods
Template-free: G2G [9] 2020 – Method (VGT)
59. • Top-k result
59[9] Shi et al.
Reaction class is given Reaction class is unkwon
Machine learning based methods
Template-free: G2G [9] 2020 – Results
60. • Module performance
• Contribution
• It novelly formulates retrosynthesis prediction as a graph-to-graphs translation task
• Limitation
• Well-tuned Molecule Transformers performs better
60
Machine learning based methods
Template-free: G2G [9] 2020 – Results
[9] Shi et al.
61. • Template-free: GraphRetro [10] (arXiv 2020)
61
Machine learning based methods
Template-free: GraphRetro [10] 2020
[10] Somnath et al.
62. • Template-free: GraphRetro [10] (arXiv 2020)
• Key Idea
• It also uses the idea of breaking and modifying graphs like G2G[22].
• G2G[22] modified the graph at the level of atoms, but it operates at level of molecular fragments
called as leaving groups.
• G2G: Sequential generation
• GraphRetro: Leaving group selection
62
Machine learning based methods
Template-free: GraphRetro [10] 2020 – Key Idea
[10] Somnath et al.
63. • Top-k result
63
Machine learning based methods
Template-free: GraphRetro [10] 2020 - Results
[10] Somnath et al.
64. • Module performance
• Contribution
• Choosing a leaving group is a good idea for retrosynthesis problems
• Limitation
• Domain knowledge is required to create a leaving group vocabulary
64
Machine learning based methods
Template-free: GraphRetro [10] 2020 - Results
[10] Somnath et al.
67. Machine learning based
Selection-based: Bayesian Retrosynthesis [11] – Key Idea
• Key Idea
• It uses pre-trained forward model for likelihood of Bayes’ theorem and uses approximate
posterior distribution of reactants.
• It uses Monte Carlo search for exploring synthetic routes
67[11] Guo et al.
68. Machine learning based
Selection-based: Bayesian Retrosynthesis [11] – Method
• Method
• Likelihood is the Boltzmann distribution with an inverse temperature.
• Energy function: Tanimoto distance between target product and predicted product
• Approximate posterior
• Exact computation across all candidates is generally infeasible.
68
Predicted product by forward model (Molecular Transformer)
[11] Guo et al.
69. Machine learning based
Selection-based: Bayesian Retrosynthesis [11] – Method (SMC)
• Method (Cont’d)
• Sampling from the posterior
• Sequential Monte Carlo (SMC)
•
• Cons
• Particle impoverishment [38]
• Rapid loss of diversity
• Computation cost of using forward model (Molecular Transformer)
69[11] Guo et al. [38] Stavropoulos et al.
70. Machine learning based
Selection-based: Bayesian Retrosynthesis [11] – Method
• Method (Cont’d)
• SMC accelerated by surrogate likelihood.
• It trains Gradient Boosting Regression Tree that predicts likelihood of Molecular
Transformer
70[11] Guo et al.
72. Table of Contents
• Introduction
• Retrosynthesis prediction
• Dataset description
• Overview of general approaches: Template-based, Template-free, Selection-based
• Proposed methods
• Classical computer-aided methods
• Machine learning based methods
• Challenges
• Practice
• RDKit
• OpenNMT
• Related works
• Future directions
• Reference
• Appendix
73. Challenges
Challenge 1. Balancing between template-free and template-based model
Challenge 2. Multi-Step retrosynthesis
Challenge 3. Extremely large space of synthesis routes
Challenge 4. Molecule decoding (Graph generation)
73[3] Coley et al. [14] Segler et al.
74. Challenges:
1. Balancing between template-free and template-based model
• How about a hybrid model using uncertainty ?
74
f
Pros
• High
interpretability
Cons
• Low
generalizability
• Require domain
knowledge
Pros
• Generalizability
Cons
• Invalid/Inaccessible
predictions
• Low interpretability
75. • Most chemical molecules in real world cannot be synthesized within one step.
• It could go up to 60 steps or even more.
• Error accumulation
• Extremely large space
• Most recent work [13] uses neural guided A* search.
75[13] Chen et al.
Challenges:
2. Multi-Step retrosynthesis
76. • Each molecule could be synthesized by hundreds of different possible reactants.
• How to measure a good synthesis routes ?
76
Challenges:
3. Extremely large space of synthesis routes
77. • Modeling complex distributions over graphs and then efficiently sampling is challengin
g!
• Why is it challenging?
• Non-unique
• High dimensional nature of graphs
• Complex, non-local dependencies b/w nodes and edges.
• Proposed methods
• Graph VAE [29] (ICANN 2018)
• Graph RNN [30] (ICML 2018)
• GRAN [31] (NeurIPS 2019)
• Junction tree VAE [35] (ICML 2019)
77[29] Schlichtkrull et al. [30] You et al. [31] Liao et al. [35] Jin et al.
Challenges:
4. Molecule decoding (Graph generation)
78. Table of Contents
• Introduction
• Retrosynthesis prediction
• Dataset description
• Overview of general approaches: Template-based, Template-free, Selection-based
• Proposed methods
• Classical computer-aided methods
• Machine learning based methods
• Challenges
• Practice
• RDKit
• OpenNMT
• Related works
• Future directions
• Reference
• Appendix
79. Practice: RDkit
• Data pre-processing (RDKit)
• RDKit[20] is an open-source library for Cheminformatics.
• https://www.rdkit.org
• Why RDKit?
• Visualizing
• Substructure searching
• Calculate molecule similarity
• Validity check
• Various function for Cheminformatics
• We upload RDKit tutorial notebook:
• https://github.com/wonjun-dev/contrastive-retro
79
80. Practice: OpenNMT
• OpenNMT
• OpenNMT[28] is an open-source library for neural machine translations.
• https://opennmt.net
• It supports various models for encoder-decoder framework.
• Why OpenNMT?
• It supports various models for encoder-decoder framework.
• Built-in functions.
• Easy to engineer.
• Cons
• Too huge
• Flexibility
• Discontinued procedure (train-inference-performance check)*7
80[28] Klein et al., *7: We made fully-automated script.
81. Practice: OpenNMT – Where you should change
• OpenNMT
• Primary files in OpenNMT
• Data loader
• preprocess.py
• inputter.py (.onmt/inputters)
• Options
• opts.py (./onmt) => Several options for train, translate, preprocessing and etc. You can
make your own options in here.
• Train
• train.py => Entry point of training
• train_single.py (./ommt) => Second entry point of training
• trainer.py (./onmt) => Main training loop
• loss.py (.onmt/utils) => Several classes for loss function
• Model
• model_builder (./onmt)
• model.py (./onmt/models) => Model class
• model_saver (./onmt/models)
• Translation
• translate.py => Entry point of translation
• translator.py (./onmt/translate) => Translator class
• Performance check
• parse_output.py (./parse) => Parse predicted output and calculate accuracy via RDKit.
81
82. Practice: OpenNMT – Automated script
• OpenNMT
• We provide fully-automated (training to parsing) script.
• https://github.com/wonjun-dev/contrastive-retro @master branch
• run_experiment_mt.sh
• Train – Inference (Translate) – Performance check (Parse) – Averaging
• arg[0] : GPU id
• arg[1]: seed
• run_average.py
• The performance variation of MT and LV-MT is quite large depending on seed.
82
83. Table of Contents
• Introduction
• Retrosynthesis prediction
• Dataset description
• Overview of general approaches: Template-based, Template-free, Selection-based
• Proposed methods
• Classical computer-aided methods
• Machine learning based methods
• Challenges
• Practice
• RDKit
• OpenNMT
• Related works
• Future directions
• Reference
• Appendix
84. Related works
• Forward synthesis
• Given reactants and reagents, predict the products.
• [7, 34, 36, 37]
• Reaction center prediction
• The task of identifying the reaction center is related to the step of deriving the synthons
(intermediate outcomes) in retrosynthesis.
• [9, 10, 33, 34]
• Graph generation
• Generative models for real-world graphs, including social, chemical and knowledge graph
• [29, 30, 31, 35]
84
85. Table of Contents
• Introduction
• Retrosynthesis prediction
• Dataset description
• Overview of general approaches: Template-based, Template-free, Selection-based
• Proposed methods
• Classical computer-aided methods
• Machine learning based methods
• Challenges
• Practice
• RDKit
• OpenNMT
• Related works
• Future directions
• Reference
• Appendix
86. Future directions
• Training chemical language models like BERT
• Learning better chemical representation
• Atomic or molecular embedding considering chemical properties
• Robust to SMILES augmentation
• Contrastive learning
• Template-Generative Hybrid model
• Graph encoding – SMILES decoding
• Graph decoding is challenging
• Predictive model for subgraph isomorphism
• Subgraph isomorphism is a NP-complete problem, it is not scalable.
86
87. References
[1] Weininger et al. “A chemical language and information system. 1. introduction to methodology and encoding
rules.” Journal of Chemical Information and Modeling, 1988.
[2] Christ et al. “Mining electronic laboratory notebooks: Analysis, retrosynthesis, and reaction based
enumeration.” Journal of Chemical Information and Modeling, 2012.
[3] Coley et al. “Computer-assisted retrosynthesis based on molecular similarity.” ACS Central Science, 2017.
[4] Klucznik et al. “Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed
in the laboratory.” Chem, 2018.
[5] Dai et al. “Retrosynthesis prediction with conditional graph logic network”. NeurIPS, 2019.
[6] Schwaller et al. “Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction.” ACS
Central Science, 2019.
[7] Lee et al. “Molecular transformer unifies reaction prediction and retrosynthesis across pharma chemical space.”
Chemical Communications, 2019.
[8] Chen et al. “Learning to make generalizable and diverse predictions for retrosynthesis.” arXiv preprint 2019.
[9] Shi et al. “A graph to graphs framework for retrosynthesis prediction.”, ICML, 2020
[10] Somnath et al. “Learning graph models for template-free retrosynthesis.”, arXiv, 2020
[11] Guo et al. “A Bayesian algorithm for retrosynthesis.”, arXiv, 2020
[12] Lin et al. “Automatic retrosynthetic route planning using template-free models.”, Chem. Sci., 2020
[13] Chen et al. “Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search”, ICML, 2020
87
88. References
[14] Segler et al., “Neural-Symbolic machine learning for retrosynthesis and reaction prediction.”, Chemistry-A European
Journal, 2017
[15] Satoh et al., “A novel approach to retrosynthetic analysis using knowledge bases derived from reaction databases.”,
Chem. Inf. Comput. Sci., 1999
[16] Law et al., “Route designer: A retrosynthetic analysis tool utilizing automated retrosynthetic rule generation.”, Chem.
Inf., 2009
[17] Gasteiger et al., “A collection of computer methods for synthesis design and reaction prediction.”, Recl. Trav. Chim.
Pays-Bas, 1992
[18] Corey et al., “Computer-assisted analysis in organic synthesis.”, Science, 1985
[19] Corey et al., “The logic of chemical synthesis: Multistep synthesis of complex carbogenic molecules. (Nobel lecture)”,
1991
[20] http://www.rdkit.org/UGM/2012/Landrum_RDKit_UGM.Fingerprints.Final.pptx.pdf
[21] Liu et al., “Retrosynthetic reaction prediction using neural sequence-to-sequence models.”, ACS Cent. Sci., 2017
[22] Zheng et al., “Predicting retrosynthetic reactions using self-corrected transformer neural networks.”, J. Chem. Inf.
Model., 2020
[23] Srivastava et al., “Highway networks”, NIPS, 2015
[24] https://chemistry-europe.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fchem.201605499&fil
e=chem201605499-sup-0001-misc_information.pdf
[25] http://www.reaxys.com, Reaxys is a registered trademark of RELX Intellectual Properties SA used under license.
[26] Shen et al., “Mixture model for diverse machine translations: Tricks off the trade.”, arXiv, 2019
88
89. References
[27] Schlichtkrull et al., “Modeling relational data with graph convolutional networks.”, In European
Semantic Web Conference, 2018
[28] Klein et al., “OpenNMT: Open-Source Toolkit for Neural Machine Translation.”, arXiv, 2017
[29] Simonovsky et al., “GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders.”,
ICANN, 2018
[30] You et al., “GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models.”, ICML, 2018
[31] Liao et al., “Efficient Graph Generation with Graph Recurrent Attention Networks.”, NeurIPS, 2019
[32] Mayfield et al., “Pistachio 2.0 edn software.”, 2018
[33] Coley et al., “A graph-convolutional neural network model for the prediction of chemical reactivity.”,
Chemical Science 2019
[34] Coley et al., “Predicting organic reaction outcomes with Weisfeiler-Lehman Network.”, NeurIPS, 2017
[35] Jin et al., “Junction Tree Variational Autoencoder for molecular graph generation.”, ICML, 2019
[36] Bradshaw et al., “A generative model for electron path.”, ICLR, 2019
[37] DO et al., “Graph transformation policy network for chemical reaction prediction.”, KDD, 2019
[38] Stavropoulos et al., “Sequential Monte Carlo method in practice.”, Springer, 2001
89
90. Appendix
1. Subgraph isomorphism problem
• It is a computational task in which two graphs G and H are given as input, and one must det
ermine whether G contains a subgraph that is isomorphic to H
• NP-Complete
2. Molecular similarity metrics (x and y are molecular fingerprint)
90
91. Appendix
3. Reaction class
• Meta-information about type of chemical reactions.
• In USPTO, there are 10 reaction classes
91
93. Appendix
5. Better hyper-parameters of MT and the results.
• Dropout p=0.25 is better than p=0.1
• We can remove invalid and repeated SMILES via RDKit.
• Also, Using 6 layers and increasing the dropout rate is better than using 4 layers.
93
Top 1 Top 3 Top 5 Top 10
MT [8] 0.420 0.570 0.619 0.657
MT (p=0.25, w/o
inval/repeat)
0.432 0.645 0.709 0.771