How predictive models help Medicinal Chemists design better drugs_webinar

Olivier Barberan
Senior Product Manager
Data driven drug discovery :
How Predictive models help medicinal
chemists design better drugs.

Putting Data to Work |
Today’s discussion
• Elsevier is a partner and vendor to 100% of the world’s top
pharmaceutical companies
• It provides leading data solutions in chemistry, biology, drug safety
and biomedical review
• Our workflow solutions are based on clearly defined use cases
BUT as with many data-driven companies, our data is used for specific
purposes via different user interfaces.
Our Challenge: How to externalize data for:
• Complementary data integration
• Bioactivity prediction
• Use in additional use cases and workflows
2

The Challenge Pharma face : Investing in earlier development stages
builds up the pipeline and reduces attrition from foreseeable causes
3
Target ID &
Validation
Lead ID &
Validation
Pre-clinical Clinical
(Phase I to III)
Post-Launch
Characterize &
understand
disease
Identify, design &
validate leads
Cull/prioritize leads Determine safety
and efficacy profile
Manage risk &
compliance; improve
patient care
Source: Tufts Center for the study of drug development, Nov 2014
$125 M $773 M $200 M $1,460 M $3–5 B
Cost (/NME)
“We cannot fail for reasons we could have predicted. We
should fail only for reasons we could not predict.”
—Dr Moncef Slaoui
Head of Global R&D, GSK
• Low margin of safety is a major
cause of attrition in Phase I and II
• Lack of efficacy is a major cause of
attrition in Phase II and III
Better informed decisions at the Lead ID
& Validation stage generates more
optimized leads and mitigates failures
and miss-investments
80% 65% 69% 12%
Success Rate
“Usability & Re-usability of data is critical to our success.
We are experts at capturing data for purpose, and poor at
making that data available for any purpose ”
—Dr M. Ramsey,
Chief Data Officer, GSK

| 4
Excerption of facts – how Reaxys Medicinal
Chemistry handles the literature
Excerption of facts and Indexing of literature and patents is done by
pharmacologists for Medicinal chemists
Substance-record
Bibliographic-record
ExcerptionIndexing
Bioactivity-record

Potency &
selectivity
DMPK properties
Physical properties
Safety
pharmacology
The lead optimization challenge: Optimization of early
substance to potential drug

Hit Identification Lead gen. and optimization
Drug discovery involves different personas and needs
Medicinal chemists: “I need to
optimize chemical structures in order to
improve affinity, selectivity ADMET
properties and decrease side effects”
Computational chemists:
“I need to find hits for druggable targets”
(virtual screening)
Integration /
Professional
Services
In house data

Hit to lead : Virtual screening

Ligand-based virtual screening – Using Reaxys Medicinal
Chemistry
Objective
• Describe an In Silico Screening approach
using Reaxys Medicinal Chemistry
Case Study on T-Type calcium channels

Ligand-based in silico screening
730 Query structures
Representation & Chemical
Space Molecular descriptors &
Fingerprints
Virtual Screening
Pharmacophoric Similarity
N
O
N
N
N
O
N
N
N
314 Hits
"Drug-like" Filtering
1. Molecular diversity and chemical originality
2. Compounds availability
39 compounds ordered for testing
28 M Substances
Chemical space based on
synthesized substances
Pipeline pilot

Putting Data to Work | 11
Biological testing of the 39 hits on Cav3.2
Electrophysiology experiments: Screening @10 µM
on Cav3.2 T-Type channels
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627 2930313233343536373839
0
25
50
75
100
Peakcurrentinhibition(%)
28
9 compounds with a % inhibition > 75%
15 compounds with a % inhibition >50%
Compound # (@ 10µM)
Back to
referring
slide
Reaxys Medicinal Chemistry supports Ligand-based virtual
screening approaches

Lead optimization
Predictive Modeling of On-And-Off Target Bioactivities

TITLE OF PRESENTATION | 13
Issue
• Need to make best possible decisions to rank/decide on drug
candidates
• Use all available information to help make decisions about potential
therapeutics
Method
• Build hundreds of models for on and off-target activities
• Create a model building engine in KNIME that can make models for large
lists of targets using data from Reaxys Medicinal Chemistry.
• Selectivity panels.
• Toxicology pathways (from pathway analysis).
• PK/ADME models
Challenge: Make best decisions based on data

TITLE OF PRESENTATION |
• Issue: some targets have thousands of measurements
• Solution: Randomly select N examples from each decade of pX activity
• Selecting from each decade provides a more balanced data set across the activity range
• In this study only IC50 measurements are used, not % inhibition or Kd
• N is from 50 to 200 in these examples.
• Compounds are selected so that training and test sets have no overlap
• Use random selection of both
• For each compound the median pX value is chosen when multiple measurements are
reported
• pX = -log(molar activity)
• Partial-least-squares correlation is used, selecting the optimal number of descriptor
components
• Model quality is assessed
• I like RMS error of prediction because it most directly answers the question “how accurate do I expect the
prediction to be?”
• r2, F and other statistics are also computed.
14
Elements of the model making workflow

Random Selection to ‘Flatten’ Activity Distribution
1 2 43 65 7 8 9 10
-log(M)
#of
compounds
1 2 43 65 7 8 9 10
-log(M)
#of
compounds
Raw reports – may be thousands for a
popular target. Activities and structure
distribution may skew statistics
Randomly selected ‘N’
compounds per decade
Result - diverse set of
structures and more uniform
distribution of activities

High-Level workflow for creating models
• Uses a list of UNIPROT target names, and NCBI
synonyms
• Attempts to create a model for each target
• Rejects models for insufficient data, or low
quality

• Powerful statistical methods can select variables to create a model that looks great on
the training set
• However the real test is how well the model can predict activity for a compound not in
the training set.
• In these examples the Training and Test sets are not overlapping; no test compound is
in the training set.
• In this study the best metric for the model is the RMS Error of Prediction – the errors
seen when predicting activities for compounds not in the training set.
• This is useful to provide the “error bar” for the predictions.
• The required accuracy depends on what you are using the prediction for
• The Pearson r2 provides a metric of “fit” but is dependent on the range of the data and does not
directly tell you how much error to expect.
• Other metrics for prediction might include how similar the compound is any compounds in the training
set. One might expect the predictions to be less reliable if the compound is very different.
17
The Importance of the Test Set

Model Engine
• Creates series of Reaxys queries for each decade of activity
• Uses PLS/R to create and evaluate models
• Creates spreadsheet with RMS error of prediction and plots

• The descriptors are properties computed for the compounds that are
related to the activities
• They encode elements of the structure and other properties of the
compounds.
• In this study we combined 2 sets of computed descriptors
• RD Kit
• CDK Toolkit
19
Descriptors

• Results are similar for this ABL example.
• For ‘final’ models larger selections or all data points can be used to increase
robustness
• The automation aspect makes repeated trials easy to do
20
Q: Since random sets are selected, what happens if you repeat trials?
A: The results are nearly identical in a statistical sense
Model
RMS Error of
Prediction
Training Set 0.63
Test Set 1.6
Model
RMS Error of
Prediction
Training Set 0.62
Test Set 1.0
y = 0.9924x
R² = 0.8803
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
pXPredicted
pX Measured
Training Set
y = 0.9964x
R² = 0.7402
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
pXPredicted
pX Measured
Test Set
y = 0.9924x
R² = 0.8804
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
pXPredicted
pX Measured
Training Set y = 1.0049x
R² = 0.4629
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
pXPredicted
pX Measured
Test Set
y = 0.9914x
R² = 0.8597
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
pXPredicted
pX Measured
Training Set
y = 0.9959x
R² = 0.7053
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
pXPredicted
pX Measured
Test Set
Model
RMS Error of
Prediction
Training Set 0.66
Test Set 1.0

• Training set, test set
• Statistics for training and test sets.
21
Spreadsheet created for each model

EGFR Model Example
y = 0.9829x
R² = 0.8205
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
pXPredicted
pX Measured
EGFR Training Set y = 0.9765x
R² = 0.7151
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
pXPredicted
pX Measured
EGFR Test Set
RMSEP Type target protein trainingSetSize testSetSize
0.97 Training Set P00533 EGFR 1583
1.2 Test Set P00533 EGFR 521

0.76 Training Set O60674 JAK2 1372
0.89 Test Set O60674 JAK2 446
23
JAK2 Model Example
y = 0.9889x
R² = 0.7638
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
pXPredicted
pX Measured
JAK2 Training Set y = 0.9956x
R² = 0.6749
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
pXPredicted
pX Measured
JAK2 Test Set

0.65 Training Set P49841 GSK3β 1271
0.97 Test Set P49841 GSK3β 427
24
GSK3 Model Example
y = 0.9911x
R² = 0.813
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
pXPredicted
pX Measured
GSK3β Training Set y = 1.0008x
R² = 0.5678
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16pXPredicted
pX Measured
GSK3β Test Set

0.90 Training Set P98161 PKD1 60
1.05 Test Set P98161 PKD1 26
25
Example of Poor Model
y = 0.9795x
R² = -3.788
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
pXPredicted
pX Measured
PDK1 Training Set
y = 0.9867x
R² = -15.88
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
pXPredicted pX Measured
PDK1 Test Set

Prediction of Activities
• Reads in all models
• Reads in a set of molecules in SDF format
• Predicts activity for each model
• Computes Z value for classification of active/inactive of pX > 6
• Classify as active/inactive based on desired confidence

• Yellow  Black increasing predicted inhibition based on Z value
• Selectivity metric - % of kinases highly inhibited
27
HeatMap for Kinase panel
predIRAK3
predO00141SGK1
predO14757CHK1
predO14920ikbkb
predO14976GAK
predO15264p38Î´MAPK
predO15530PDK1
predO43353RIP2
predO60674JAK2
predO75116ROCK2
predO75582MSK1
predO96013PAK4
predO96017CHK2
predP00519ABL1
predP00533EGFR
predP04049c-Raf
predP04626ERBB2
predP06239Lck
predP06493CDK1
predP11309PIM1
predP12931Src
predP15056BRAF
predP15735PHK
predP17252PKCÎ±
predP23443S6K1
predP23458JAK1
predP24941CDK2-CyclinA
predP27361ERK1
predP27448MARK3
predP27986PIK3
predP28028B-Raf
predP28482ERK2
predP31749AKT1
predP35968VEGFR2
predP36507MEK2
predP36888FLT3
predP41240CSK
predP45983JNK1
predP45984JNK2
predP48730CK1Î´
predP49841GSK3Î²
predP51812RSK2
predP51955NEK2a
predP53778p38Î³MAPK
predP53779JNK3
predP98161PKD1
predQ02750MEK1
predQ05513PKCÎ¶
predQ13131AMPK
predQ13188MST2
predQ13627DYRK1A
predQ14012CaMK1
predQ14680MELK
predQ15418RSK1
predQ15746SmMLCK
predQ16513PRK2
predQ16539p38Î±MAPK
predQ86V86PIM3
predQ8IW41PRAK
predQ8IWQ3BRSK2
predQ8N5S9CaMKKÎ±
predQ8TD08ERK8
predQ92630DYRK2
predQ96RR4CaMKKÎ²
predQ96SB4SRPK1
predQ9BUB5MNK1
predQ9H2X6HIPK2
predQ9H422HIPK3
predQ9HBH9MNK2
predQ9P286PAK5
predQ9UQB9AuroraC
selectivity
Molecule
M
olecule
nam
e
0 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 1 2 1 0 1 -1 0 1 -1 0 0 -1 0 0 3 -1 0 0 0 0 0 1 2 1 0 1 0 0 -1 1 0 -1 0 0 0 -1 -1 0 -1 -1 0 0 -1 0 0 1 0 0 0 -2 0 1 1 0 2 31%
fedratinib
0 -1 -1 0 1 1 0 0 2 2 0 1 -2 -1 0 0 0 -1 -1 0 -3 2 -1 0 1 1 0 -3 0 0 3 1 2 1 -1 0 0 2 -1 1 1 0 0 0 0 0 -2 -2 0 0 0 -1 -2 0 -2 0 0 -2 -3 0 0 1 0 0 0 0 1 0 0 0 0 24%
ruxolitinib
0 -1 0 1 1 1 1 0 1 0 0 0 0 1 1 0 2 -3 1 1 1 -1 0 -1 0 -1 0 0 0 1 3 0 1 1 0 0 -2 0 -1 1 0 1 0 0 -1 0 0 -1 0 0 0 -1 1 0 -1 -1 0 1 -3 0 0 1 0 0 0 0 1 -1 -1 0 0 30%
gefitinib
0 0 1 2 0 1 1 1 0 0 0 0 1 0 1 1 0 2 0 -1 0 0 0 0 -2 0 1 0 0 0 3 0 0 0 1 -2 -1 0 1 1 0 1 0 1 0 0 0 -1 0 0 -1 -1 0 0 -1 -1 1 0 -2 0 0 1 0 0 0 -3 0 1 1 0 2 30%
imatinib
1 -1 2 2 1 1 1 1 2 2 1 2 0 0 0 3 -3 -4 2 2 2 1 2 4 4 0 3 -1 1 3 -8 -1 2 0 2 3 0 2 4 1 2 3 0 1 0 2 2 0 2 2 2 -1 2 2 2 3 5 3 2 1 1 1 1 2 0 4 0 0 1 1 2 73%
staurosporine

Fedratinib – trials stopped due to cerebral edema in patients.

IRAK3 identified by pathway analysis as significant for brain
toxicity
Rinker, Predicting adverse drug events using literature-based pathway analysis 251st National Meeting of the
American Chemical Society, San Diego CA March 13-17, 2016

• Yellow  Black increasing predicted inhibition based on Z value
• Selectivity metric - % of kinases highly inhibited
31
HeatMap for Kinase panel
predIRAK3
predO00141SGK1
predO14757CHK1
predO14920ikbkb
predO14976GAK
predO15264p38Î´MAPK
predO15530PDK1
predO43353RIP2
predO60674JAK2
predO75116ROCK2
predO75582MSK1
predO96013PAK4
predO96017CHK2
predP00519ABL1
predP00533EGFR
predP04049c-Raf
predP04626ERBB2
predP06239Lck
predP06493CDK1
predP11309PIM1
predP12931Src
predP15056BRAF
predP15735PHK
predP17252PKCÎ±
predP23443S6K1
predP23458JAK1
predP24941CDK2-CyclinA
predP27361ERK1
predP27448MARK3
predP27986PIK3
predP28028B-Raf
predP28482ERK2
predP31749AKT1
predP35968VEGFR2
predP36507MEK2
predP36888FLT3
predP41240CSK
predP45983JNK1
predP45984JNK2
predP48730CK1Î´
predP49841GSK3Î²
predP51812RSK2
predP51955NEK2a
predP53778p38Î³MAPK
predP53779JNK3
predP98161PKD1
predQ02750MEK1
predQ05513PKCÎ¶
predQ13131AMPK
predQ13188MST2
predQ13627DYRK1A
predQ14012CaMK1
predQ14680MELK
predQ15418RSK1
predQ15746SmMLCK
predQ16513PRK2
predQ16539p38Î±MAPK
predQ86V86PIM3
predQ8IW41PRAK
predQ8IWQ3BRSK2
predQ8N5S9CaMKKÎ±
predQ8TD08ERK8
predQ92630DYRK2
predQ96RR4CaMKKÎ²
predQ96SB4SRPK1
predQ9BUB5MNK1
predQ9H2X6HIPK2
predQ9H422HIPK3
predQ9HBH9MNK2
predQ9P286PAK5
predQ9UQB9AuroraC
selectivity
Molecule
M
olecule
nam
e
0 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 1 2 1 0 1 -1 0 1 -1 0 0 -1 0 0 3 -1 0 0 0 0 0 1 2 1 0 1 0 0 -1 1 0 -1 0 0 0 -1 -1 0 -1 -1 0 0 -1 0 0 1 0 0 0 -2 0 1 1 0 2 31%
fedratinib
0 -1 -1 0 1 1 0 0 2 2 0 1 -2 -1 0 0 0 -1 -1 0 -3 2 -1 0 1 1 0 -3 0 0 3 1 2 1 -1 0 0 2 -1 1 1 0 0 0 0 0 -2 -2 0 0 0 -1 -2 0 -2 0 0 -2 -3 0 0 1 0 0 0 0 1 0 0 0 0 24%
ruxolitinib
0 -1 0 1 1 1 1 0 1 0 0 0 0 1 1 0 2 -3 1 1 1 -1 0 -1 0 -1 0 0 0 1 3 0 1 1 0 0 -2 0 -1 1 0 1 0 0 -1 0 0 -1 0 0 0 -1 1 0 -1 -1 0 1 -3 0 0 1 0 0 0 0 1 -1 -1 0 0 30%
gefitinib
0 0 1 2 0 1 1 1 0 0 0 0 1 0 1 1 0 2 0 -1 0 0 0 0 -2 0 1 0 0 0 3 0 0 0 1 -2 -1 0 1 1 0 1 0 1 0 0 0 -1 0 0 -1 -1 0 0 -1 -1 1 0 -2 0 0 1 0 0 0 -3 0 1 1 0 2 30%
imatinib
1 -1 2 2 1 1 1 1 2 2 1 2 0 0 0 3 -3 -4 2 2 2 1 2 4 4 0 3 -1 1 3 -8 -1 2 0 2 3 0 2 4 1 2 3 0 1 0 2 2 0 2 2 2 -1 2 2 2 3 5 3 2 1 1 1 1 2 0 4 0 0 1 1 2 73%
staurosporine
IRAK3
JAK2 JAK1

• This tool makes it very easy to do answer many many questions with
minimal time and effort, as shown.
 One can add ability to search for specific scaffolds and combine with target query.
• It provides an unprecedented tool to explore QSAR by target, class and
other criteria.
• One can assess the likelihood of sufficient data to make a model for a
given target
 The automatic evaluation using a test set, provides nearly instant feedback on the
prospect of making a predictive model.
• Because it uses the KNIME system, nearly any descriptor set can be
used for making the models.
 There are additional open-source and commercial descriptor tools available.
 One can probably get better results with better descriptors.
32
Conclusions

ADMET Properties influencing medicinal Chemistry design

Prediction of ADMET properties
Influencing medicinal chemistry design
• logD7.4
• Protein Binding
• Solubility
• Metabolic
Stability
• hERG
• Etc…
Step1
•Rat PPB
• Hu heps
• CYP inhib
• Caco2
• NaV1.5
• Etc…
Step2
• logD7.4
• Solubility
•Protein Binding
• hERG
• Rat PPB
• Metabolic
Stability
• CYP inhib
• Caco2
• etc.
Step 0
AstraZeneca’s global HERG QSAR model70 has contributed to the
reduction in the synthesis of ‘red flag’ compounds (compounds that are
measured to have an HERG potency of <1μM)
from 25.8% of all compounds tested in 2003 to only 6% in 2010.
Cumming, J.G., Davis, A.M. et al. Nat. Rev. Drug Disc. (2013) 12, 948–962

Prediction of Cardiotoxic drugs related to hERG blockade

Putting Data to Work | 36
How to avoid hERG inhibition?
QT interval prolongation can lead to serious arrhythmias which can evolve into a fatal issue.
A large number of non-cardiac drugs-induced QT prolongation has been reported and
continues to increase with the withdrawal of some blockbuster medicines.
hERG seems to be the main target of this adverse side effect.
In silico models could be rapid and powerful tools to screen out potential hERG blockers as
early as possible during the discovery process

Extraction & Methodology
hERG data set
640 mol
Recursive Partitioning (RP)
MOE QuaSAR Classify
Molecules tested on hERG (Kv11.1)
2D-Molecular descriptors sets Predictive models
Subsets of molecules
According to biological detailed protocols
Representation of hERG ligands within chemical space
of NCI database according to the two first PCA axis
NCI database
hERG data set
Cross-validation
External validation

Model 3 HIGH WEAK
1 µM 10 µM 50 / 9 mol50 / 46 mol
Training Test
Descriptors Relevant P_VSA Relevant P_VSA
High 48/50 47/50 45/46 42/46
Weak 49/50 49/50 8/9 9/9
All 97% 96% 96% 93%
Correct classifications determined by 5-fold cross-validation (Training) and
by external validation (Test) for each descriptor set.
SlogP_VSA7
SlogP_VSA2SMR_VSA6
PEOE_VSA+1
+ +
+ +
+
-
- -
-
-
SMR_VSA5 SMR_VSA6
SMR_VSA5
SMR_VSA4
PEOE_VSA+0
+
-
SlogP
PEOE_VSA_FHYD
SMR_VSA1
SMR_VSA6
SMR_VSA1
SlogP vsa_pol
-
-
-- ++
Relevant P_VSA

Summary
 Reaxys Medicinal Chemistry enables the retrieval of high quality datasets with
chemical diversity and homogeneous biological activities.
 Pertinent predictive models of hERG activity have been designed using
recursive partitioning analysis with 2D-molecular descriptors.
 From Reaxys Medicinal Chemistry, a fast virtual screening approach could be
used as an early tool in the drug discovery process to avoid cardiotoxic side
effects related to hERG blockage.
AstraZeneca’s global HERG QSAR model70 has contributed to the reduction in the
synthesis of ‘red flag’ compounds (compounds that are measured to have an HERG
potency of <1μM)
from 25.8% of all compounds tested in 2003 to only 6% in 2010.
Cumming, J.G., Davis, A.M. et al. Nat. Rev. Drug Disc. (2013) 12, 948–962
Even if models are performing well designers need interpretable models

Wrap Up
“They care mostly about
Accessing our data through
API Knime Pipeline pilot”
“They want a product they can
use right out of the box”
New Reaxys Medicinal Chemistry is supporting Hit to lead and lead
optimization process by providing relevant and high quality data to scientists.
Computational Chemists
High quality data on many
different topics (efficacy , ADMET,
Animal models)
Large Amount of data to Perform
models
Medicinal Chemists
Accessing the data through third
party tools
Reaxys Medicinal Chemistry is able to support both Computational and
Medicinal chemists.

How predictive models help Medicinal Chemists design better drugs_webinar

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a How predictive models help Medicinal Chemists design better drugs_webinar

Similar a How predictive models help Medicinal Chemists design better drugs_webinar (20)

Más de Ann-Marie Roche

Más de Ann-Marie Roche (20)

Último

Último (20)

How predictive models help Medicinal Chemists design better drugs_webinar

Notas del editor