SlideShare a Scribd company logo
1 of 30
Download to read offline
CINF 13, ACS Fall 2017, Washington, D.C.
pistachio
Search and Faceting of Large Reaction Databases
John	Mayfield,	Daniel	Lowe,	Roger	Sayle
What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
HazELNut Filbert NameRXN Cobnut
Accelrys
Pipeline Pilot
(AstraZeneca, AbbVie
& Hoffmann-La Roche)
ChemAxon
JChem Cartridge
(GlaxoSmithKline
& Novartis)
Elsevier Reaxys
(Hoffmann-La Roche,
AstraZeneca, Merck)
Perkin Elmer Informatics
(formerly CambridgeSoft)
eNotebook v9, v11 or v13
or Symyx ELN v5.x or v6.x
Oracle Server
version 10, 11 or
Microsoft Windows, Linux or Mac OS
Infrastructure for liberating and processing
reactions from Electronic Lab Notebooks (ELNs)
CINF 13, ACS Fall 2017, Washington, D.C.
To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4-
dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was
added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095
mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours.
The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate
fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-
d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid.
[0517]
US 2016/16966 A1
Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis,
University of Cambridge, 2012
Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis,
University of Cambridge, 2012
To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4-
dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was
added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095
mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours.
The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate
fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-
d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid.
[0517]
Product Properties
7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 25 mg, 7% yield, Yellow Solid
Reactant Properties
7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 220 mg, 1.025 mmol
(3,4-dimethoxyphenyl)boronic acid 187 mg, 1.025 mmol
Agent Properties
1,4-dioxane 3mL
water 1.5mL
sodium carbonate 435 mg, 4.10 mol
tetrakis(triphenylphosphine)palladium(0) 110 mg, 0.095 mmol
DMSO
Unstructured	text	to	a	structured	reaction	table
US 2016/16966 A1
LeadMine	+	Chemical	Tagger
Christos Nicolaou et al. The Proximal Lilly Collection: Mapping, Exploring and Exploiting
Feasible Chemical Space J. Chem. Inf. Model., 2016, 56 (7), pp 1253–1266
Nadine Schneider et al. Big Data from Pharmaceutical Patents: A Computational Analysis of
Medicinal Chemists’ Bread and Butter. J. Med. Chem., 2016, 59 (9), pp 4385–4402
Nadine Schneider et al. Development of a Novel Fingerprint for Chemical Reactions and Its
Application to Large-Scale Reaction Classification and Similarity J. Chem. Inf.
Model., 2015, 55 (1), pp 39–53
Nadine Schneider et al. What’s What: The (Nearly) Definitive Guide to Reaction Role
Assignment. J. Chem. Inf. Model., 2016, 56 (12), pp 2336–2346
Connor Coley et al. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS
Cent. Sci., 2017, 3 (5), pp 434–443
Data impact
CINF 13, ACS Fall 2017, Washington, D.C.
Public subset released in 2014 as CC-Zero
Pistachio expands the scope of the data and uses Atom-
Atom Maps from NameRxn
Example	26.	Epizyme	Inc.	1-phenoxy-3-(alkylamino)-propan-2-olderivatives	as	CARM1	inhibitors	and	uses	thereof	(US	09718816	
B2)	Aug.	1,	2017
Example 26, US 09718816 B2
John	May,	et	al.	Sketchy	Sketches:	Hiding	Chemistry	in	Plain	Sight.	Seventh	Joint	Sheffield	Conference	on	
Cheminformatics.	2016
	Step	1
	Step	4
	Step	3
	Step	2
	etc..
sketch extraction
NextMove’s	Praline
total reactions over time
CINF 13, ACS Fall 2017, Washington, D.C.
0
0.5M
1.0M
1.5M
2.0M
2.5M
3.0M
3.5M
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
ReactionDetails(cumulative)
EPO Applications
EPO Grants
USPTO Applications
USPTO Grants
What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
reaction DIAGRAMS
Good reaction diagrams are essential in
communicating synthetic chemistry
Layout can be stored or generated
• When extracting from text, layout must be generated
• Generated diagrams can be unsatisfactory for display
CINF 13, ACS Fall 2017, Washington, D.C.
O
OB
OH
HO
OH
O
O
Cl
N
HN
C
O
PPd
P
P
P
O
O
Na+
Na+
-O O-
O
H2O
O
O
N
HN
C
O
O OH
O
+
ChemDrawOEChem
Generated from SMILES for US 2016/16966 A1 [0517]
ChemAxonBIOVIA
Generated from SMILES for US 2016/16966 A1 [0517]
diagram improvements
Typical work arounds:
• Separately render molecules
• Hide agents and list separately
What do humans do:
• Wrap products below
• Abbreviate functional groups and agents
• Orientate reactants to products and visa versa
• Hide agents and list as text
CINF 13, ACS Fall 2017, Washington, D.C.
Pistachio+CDK
(Abbreviated+Aligned)
Pistachio+CDK
(Abbreviated)
Generated from SMILES for US 2016/16966 A1 [0517]
reaction detail view
What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
4.1.6	Cyclic	Beckmann	rearrangement
Assigns names to 900+ reactions using transformations
Can guarantee perfect Atom-Atom Mapping
• Atom-Atom Mapping is an output not an input
• MCS mappers struggle with rearrangements:
namerxn
concepts and rxno
CINF 13, ACS Fall 2017, Washington, D.C.
1 Heteroatom alkylation and arylation
.7 O-substitution
.1 Chan-Lam ether coupling
.2 Diazomethane esterification
.3 Ethyl esterification
.4 Hydroxy to methoxy
.5 Hydroxy to triflyloxy
.6 Methyl esterification
.n
2 Acylation and related processes
.6 O-acylation to ester
.1 Ester Schotten-Baumann
.2 Esterification (generic)
.3 Fischer-Speier esterification
.4 Baeyer-Villiger oxidation
.5 Yamaguchi esterification
.6 Hydroxy to imidazolecarbonyloxy
.7 Imidazolecarbonyl to ester
.8 Hydroxy to acetoxy
.9 Steglich esterification
.n
concepts and rxno
CINF 13, ACS Fall 2017, Washington, D.C.
1 Heteroatom alkylation and arylation
.7 O-substitution
.1 Chan-Lam ether coupling
.2 Diazomethane esterification
.3 Ethyl esterification
.4 Hydroxy to methoxy
.5 Hydroxy to triflyloxy
.6 Methyl esterification
.n
2 Acylation and related processes
.6 O-acylation to ester
.1 Ester Schotten-Baumann
.2 Esterification (generic)
.3 Fischer-Speier esterification
.4 Baeyer-Villiger oxidation
.5 Yamaguchi esterification
.6 Hydroxy to imidazolecarbonyloxy
.7 Imidazolecarbonyl to ester
.8 Hydroxy to acetoxy
.9 Steglich esterification
.n
Esterification	(7)
Chan-Lam	coupling	(3)
Schotten-Baumann	
Reaction	(9)
RXNO: http://github.com/rsc-ontologies/rxno
result FACETS
Provides summary over the key concepts of results
Cut through information deluge and refine search
CINF 13, ACS Fall 2017, Washington, D.C.
• Reaction Types (NextMove ontology tree)
• Drug Targets (ChEMBL ontology tree)
• Disease Targets (MESH ontology tree)
• Yields
• Affiliation (NextMove ontology tree)
• Publication Date, Documents, Authors
CINF 13, ACS Fall 2017, Washington, D.C.
Intel(R) Core(TM) i7-6900K CPU @
3.20GHz
2.9 seconds to summarise
all 6.6 million rows
Resource expensive – O(n) size of
result set
• Client, server, or database?
• Overhead copying and transferring data that is
not needed
• Calculate when requested or up-front?
facet calculation
Custom cartridge:
What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
one entry point
CINF 13, ACS Fall 2017, Washington, D.C.
Systematic	Name Date	Range Trivial	Name
Yield	Range Affiliation Reaction	SMARTS
Disease	Target DocumentLine	Formula
SMILES InChIAuthor Protein	Target Collection
Reaction	Type	(NameRxn)SMARTSSource
…and	logical	combinations	thereof
suggestions
Based on global frequency
CINF 13, ACS Fall 2017, Washington, D.C.
Based on context frequency
structure search technology
NextMove’s Arthor Technology
Up to 100x faster then state-of-the-
art
Combination of SMARTS
compilation and efficient storage
Preliminary PostgreSQL integration
36s Arthor
56m BIOVIA Direct (Oracle)
1h Bingo (NoSQL)
1h54m Bingo (PostgreSQL)
2h6m Bingo (Oracle)
2h41m JChem (Oracle)
5h9m RDCart (PostgreSQL)
13h54m pgchem (PostgreSQL)
1d1h52m mychem (MySQL)
3d1h13m orchem (Oracle)
Benchmark: ~3.5K queries against ~7M structures (eMolecules 2014) all on the same
hardware.
John May and Roger Sayle, Substructure Search Face-off, May 2015
Intention can be refined by qualifiers
Role
{structure} product
Substructure
{structure} substructure
{structure} substructure product
Make/Break
Synthesis of {structure}
Combined with other terms
{structure} substructure product and yield of 80%
refining structure search
CINF 13, ACS Fall 2017, Washington, D.C.
Find:	7H-purine	substructure	product
Find:	Synthesis	of	7H-purine
make/break example
CINF 13, ACS Fall 2017, Washington, D.C.
Find:	7H-purine-8-one	substructure	chlorination
Find:	[*:1][CH2:2]Cl>>[*:1][CH2:2]F
Namerxn example
CINF 13, ACS Fall 2017, Washington, D.C.
Acknowledgements
Noel O’Boyle (NextMove Software), Egon Willighagen (CDK)
James Davison, Matt Swain (Vernalis)
What do Synthetic Chemists Want from Their
Reaction Systems?
Data ClassificationDiagrams Search
pistachio
http://www.nextmovesoftware.com/pistachio.html
Come find me around ACS for a demo!
See also: CINF 90

More Related Content

What's hot

企業における自然言語処理技術の活用の現場(情報処理学会東海支部主催講演会@名古屋大学)
企業における自然言語処理技術の活用の現場(情報処理学会東海支部主催講演会@名古屋大学)企業における自然言語処理技術の活用の現場(情報処理学会東海支部主催講演会@名古屋大学)
企業における自然言語処理技術の活用の現場(情報処理学会東海支部主催講演会@名古屋大学)
Yuya Unno
 

What's hot (20)

企業における自然言語処理技術利用の最先端
企業における自然言語処理技術利用の最先端企業における自然言語処理技術利用の最先端
企業における自然言語処理技術利用の最先端
 
企業における自然言語処理技術の活用の現場(情報処理学会東海支部主催講演会@名古屋大学)
企業における自然言語処理技術の活用の現場(情報処理学会東海支部主催講演会@名古屋大学)企業における自然言語処理技術の活用の現場(情報処理学会東海支部主催講演会@名古屋大学)
企業における自然言語処理技術の活用の現場(情報処理学会東海支部主催講演会@名古屋大学)
 
機械学習モデルのサービングとは?
機械学習モデルのサービングとは?機械学習モデルのサービングとは?
機械学習モデルのサービングとは?
 
ビッグデータ処理データベースの全体像と使い分け
2018年version
ビッグデータ処理データベースの全体像と使い分け
2018年versionビッグデータ処理データベースの全体像と使い分け
2018年version
ビッグデータ処理データベースの全体像と使い分け
2018年version
 
Enabling Data Science Methods for Catalyst Design and Discovery
Enabling Data Science Methods for Catalyst Design and DiscoveryEnabling Data Science Methods for Catalyst Design and Discovery
Enabling Data Science Methods for Catalyst Design and Discovery
 
最近のRのランダムフォレストパッケージ -ranger/Rborist-
最近のRのランダムフォレストパッケージ -ranger/Rborist-最近のRのランダムフォレストパッケージ -ranger/Rborist-
最近のRのランダムフォレストパッケージ -ranger/Rborist-
 
PubChem as a resource for chemical information education
PubChem as a resource for chemical information educationPubChem as a resource for chemical information education
PubChem as a resource for chemical information education
 
Graph Database Query Languages
Graph Database Query LanguagesGraph Database Query Languages
Graph Database Query Languages
 
Machine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflowMachine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflow
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Neptune, the Graph Database | AWS Floor28
Neptune, the Graph Database | AWS Floor28Neptune, the Graph Database | AWS Floor28
Neptune, the Graph Database | AWS Floor28
 
Intermediate Cypher.pdf
Intermediate Cypher.pdfIntermediate Cypher.pdf
Intermediate Cypher.pdf
 
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
Meta-Prod2Vec: Simple Product Embeddings with Side-InformationMeta-Prod2Vec: Simple Product Embeddings with Side-Information
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
 
ナレッジグラフ/LOD利用技術の入門(後編)
ナレッジグラフ/LOD利用技術の入門(後編)ナレッジグラフ/LOD利用技術の入門(後編)
ナレッジグラフ/LOD利用技術の入門(後編)
 
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
 
성장을 좋아하는 사람이, 성장하고 싶은 사람에게
성장을 좋아하는 사람이, 성장하고 싶은 사람에게성장을 좋아하는 사람이, 성장하고 싶은 사람에게
성장을 좋아하는 사람이, 성장하고 싶은 사람에게
 
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)
 
Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...
Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...
Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...
 
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
 
PostgreSQL のイケてるテクニック7選
PostgreSQL のイケてるテクニック7選PostgreSQL のイケてるテクニック7選
PostgreSQL のイケてるテクニック7選
 

Similar to CINF 13: Pistachio - Search and Faceting of Large Reaction Databases

The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
Kamel Mansouri
 
Review of some successes
Review of some successesReview of some successes
Review of some successes
Andrea Zaliani
 
Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dots
Sorel Muresan
 
Practical 9 protein structure and function (3)
Practical 9 protein structure and function  (3)Practical 9 protein structure and function  (3)
Practical 9 protein structure and function (3)
Osama Barayan
 
The influence of data curation on QSAR Modeling – examining issues of qualit...
 The influence of data curation on QSAR Modeling – examining issues of qualit... The influence of data curation on QSAR Modeling – examining issues of qualit...
The influence of data curation on QSAR Modeling – examining issues of qualit...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
DataMine Lab
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Similar to CINF 13: Pistachio - Search and Faceting of Large Reaction Databases (20)

The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
 
ICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian RadestockICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian Radestock
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Cheminformatics II
Cheminformatics IICheminformatics II
Cheminformatics II
 
Review of some successes
Review of some successesReview of some successes
Review of some successes
 
Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dots
 
Websci17 final
Websci17 finalWebsci17 final
Websci17 final
 
Practical 9 protein structure and function (3)
Practical 9 protein structure and function  (3)Practical 9 protein structure and function  (3)
Practical 9 protein structure and function (3)
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
 
The influence of data curation on QSAR Modeling – examining issues of qualit...
 The influence of data curation on QSAR Modeling – examining issues of qualit... The influence of data curation on QSAR Modeling – examining issues of qualit...
The influence of data curation on QSAR Modeling – examining issues of qualit...
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patents
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
 
CINF 2012 talk Recrystallization App
CINF 2012 talk Recrystallization AppCINF 2012 talk Recrystallization App
CINF 2012 talk Recrystallization App
 
GiTools
GiToolsGiTools
GiTools
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 

More from NextMove Software

CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
NextMove Software
 

More from NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information Exchange
 

Recently uploaded

Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Recently uploaded (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 

CINF 13: Pistachio - Search and Faceting of Large Reaction Databases

  • 1. CINF 13, ACS Fall 2017, Washington, D.C. pistachio Search and Faceting of Large Reaction Databases John Mayfield, Daniel Lowe, Roger Sayle
  • 2. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  • 3. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  • 4. HazELNut Filbert NameRXN Cobnut Accelrys Pipeline Pilot (AstraZeneca, AbbVie & Hoffmann-La Roche) ChemAxon JChem Cartridge (GlaxoSmithKline & Novartis) Elsevier Reaxys (Hoffmann-La Roche, AstraZeneca, Merck) Perkin Elmer Informatics (formerly CambridgeSoft) eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating and processing reactions from Electronic Lab Notebooks (ELNs) CINF 13, ACS Fall 2017, Washington, D.C.
  • 5. To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4- dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095 mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours. The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3- d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid. [0517] US 2016/16966 A1 Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis, University of Cambridge, 2012
  • 6. Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis, University of Cambridge, 2012 To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4- dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095 mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours. The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3- d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid. [0517] Product Properties 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 25 mg, 7% yield, Yellow Solid Reactant Properties 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 220 mg, 1.025 mmol (3,4-dimethoxyphenyl)boronic acid 187 mg, 1.025 mmol Agent Properties 1,4-dioxane 3mL water 1.5mL sodium carbonate 435 mg, 4.10 mol tetrakis(triphenylphosphine)palladium(0) 110 mg, 0.095 mmol DMSO Unstructured text to a structured reaction table US 2016/16966 A1 LeadMine + Chemical Tagger
  • 7. Christos Nicolaou et al. The Proximal Lilly Collection: Mapping, Exploring and Exploiting Feasible Chemical Space J. Chem. Inf. Model., 2016, 56 (7), pp 1253–1266 Nadine Schneider et al. Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists’ Bread and Butter. J. Med. Chem., 2016, 59 (9), pp 4385–4402 Nadine Schneider et al. Development of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and Similarity J. Chem. Inf. Model., 2015, 55 (1), pp 39–53 Nadine Schneider et al. What’s What: The (Nearly) Definitive Guide to Reaction Role Assignment. J. Chem. Inf. Model., 2016, 56 (12), pp 2336–2346 Connor Coley et al. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS Cent. Sci., 2017, 3 (5), pp 434–443 Data impact CINF 13, ACS Fall 2017, Washington, D.C. Public subset released in 2014 as CC-Zero Pistachio expands the scope of the data and uses Atom- Atom Maps from NameRxn
  • 8. Example 26. Epizyme Inc. 1-phenoxy-3-(alkylamino)-propan-2-olderivatives as CARM1 inhibitors and uses thereof (US 09718816 B2) Aug. 1, 2017 Example 26, US 09718816 B2 John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step 1 Step 4 Step 3 Step 2 etc.. sketch extraction NextMove’s Praline
  • 9. total reactions over time CINF 13, ACS Fall 2017, Washington, D.C. 0 0.5M 1.0M 1.5M 2.0M 2.5M 3.0M 3.5M 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 ReactionDetails(cumulative) EPO Applications EPO Grants USPTO Applications USPTO Grants
  • 10. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  • 11. reaction DIAGRAMS Good reaction diagrams are essential in communicating synthetic chemistry Layout can be stored or generated • When extracting from text, layout must be generated • Generated diagrams can be unsatisfactory for display CINF 13, ACS Fall 2017, Washington, D.C.
  • 13. ChemAxonBIOVIA Generated from SMILES for US 2016/16966 A1 [0517]
  • 14. diagram improvements Typical work arounds: • Separately render molecules • Hide agents and list separately What do humans do: • Wrap products below • Abbreviate functional groups and agents • Orientate reactants to products and visa versa • Hide agents and list as text CINF 13, ACS Fall 2017, Washington, D.C.
  • 17. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  • 18. 4.1.6 Cyclic Beckmann rearrangement Assigns names to 900+ reactions using transformations Can guarantee perfect Atom-Atom Mapping • Atom-Atom Mapping is an output not an input • MCS mappers struggle with rearrangements: namerxn
  • 19. concepts and rxno CINF 13, ACS Fall 2017, Washington, D.C. 1 Heteroatom alkylation and arylation .7 O-substitution .1 Chan-Lam ether coupling .2 Diazomethane esterification .3 Ethyl esterification .4 Hydroxy to methoxy .5 Hydroxy to triflyloxy .6 Methyl esterification .n 2 Acylation and related processes .6 O-acylation to ester .1 Ester Schotten-Baumann .2 Esterification (generic) .3 Fischer-Speier esterification .4 Baeyer-Villiger oxidation .5 Yamaguchi esterification .6 Hydroxy to imidazolecarbonyloxy .7 Imidazolecarbonyl to ester .8 Hydroxy to acetoxy .9 Steglich esterification .n
  • 20. concepts and rxno CINF 13, ACS Fall 2017, Washington, D.C. 1 Heteroatom alkylation and arylation .7 O-substitution .1 Chan-Lam ether coupling .2 Diazomethane esterification .3 Ethyl esterification .4 Hydroxy to methoxy .5 Hydroxy to triflyloxy .6 Methyl esterification .n 2 Acylation and related processes .6 O-acylation to ester .1 Ester Schotten-Baumann .2 Esterification (generic) .3 Fischer-Speier esterification .4 Baeyer-Villiger oxidation .5 Yamaguchi esterification .6 Hydroxy to imidazolecarbonyloxy .7 Imidazolecarbonyl to ester .8 Hydroxy to acetoxy .9 Steglich esterification .n Esterification (7) Chan-Lam coupling (3) Schotten-Baumann Reaction (9) RXNO: http://github.com/rsc-ontologies/rxno
  • 21. result FACETS Provides summary over the key concepts of results Cut through information deluge and refine search CINF 13, ACS Fall 2017, Washington, D.C. • Reaction Types (NextMove ontology tree) • Drug Targets (ChEMBL ontology tree) • Disease Targets (MESH ontology tree) • Yields • Affiliation (NextMove ontology tree) • Publication Date, Documents, Authors
  • 22. CINF 13, ACS Fall 2017, Washington, D.C. Intel(R) Core(TM) i7-6900K CPU @ 3.20GHz 2.9 seconds to summarise all 6.6 million rows Resource expensive – O(n) size of result set • Client, server, or database? • Overhead copying and transferring data that is not needed • Calculate when requested or up-front? facet calculation Custom cartridge:
  • 23. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  • 24. one entry point CINF 13, ACS Fall 2017, Washington, D.C. Systematic Name Date Range Trivial Name Yield Range Affiliation Reaction SMARTS Disease Target DocumentLine Formula SMILES InChIAuthor Protein Target Collection Reaction Type (NameRxn)SMARTSSource …and logical combinations thereof
  • 25. suggestions Based on global frequency CINF 13, ACS Fall 2017, Washington, D.C. Based on context frequency
  • 26. structure search technology NextMove’s Arthor Technology Up to 100x faster then state-of-the- art Combination of SMARTS compilation and efficient storage Preliminary PostgreSQL integration 36s Arthor 56m BIOVIA Direct (Oracle) 1h Bingo (NoSQL) 1h54m Bingo (PostgreSQL) 2h6m Bingo (Oracle) 2h41m JChem (Oracle) 5h9m RDCart (PostgreSQL) 13h54m pgchem (PostgreSQL) 1d1h52m mychem (MySQL) 3d1h13m orchem (Oracle) Benchmark: ~3.5K queries against ~7M structures (eMolecules 2014) all on the same hardware. John May and Roger Sayle, Substructure Search Face-off, May 2015
  • 27. Intention can be refined by qualifiers Role {structure} product Substructure {structure} substructure {structure} substructure product Make/Break Synthesis of {structure} Combined with other terms {structure} substructure product and yield of 80% refining structure search CINF 13, ACS Fall 2017, Washington, D.C.
  • 30. Acknowledgements Noel O’Boyle (NextMove Software), Egon Willighagen (CDK) James Davison, Matt Swain (Vernalis) What do Synthetic Chemists Want from Their Reaction Systems? Data ClassificationDiagrams Search pistachio http://www.nextmovesoftware.com/pistachio.html Come find me around ACS for a demo! See also: CINF 90