Describes MedChemica research on combining Matched Molecular Pair Analysis (MMPA) and Machine Learning (ML) into a closed loop to find and optimize new hits for drug discovery. The talks describes the MMPA and Regression Forest models and how they were combined and some early conclusion. Of these permutative MMPA is the clear winner (Free Wilson ++)
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
MedChemica Active Learning - Combining MMPA and ML
1. Exploiting medicinal chemistry knowledge to accelerate projects October 2020
October 2020
Not for Circulation
Accelerating lead optimisation with Active Learning -
joining MMPA ADMET knowledge with Regression Forest
machine learning models
Dr Alexander G. Dossetter
Managing Director, MedChemica Ltd
Available on Slideshare - search for Dossetter
Twitter @MedChemica
Twitter @covid_moonshot
Twitter #BucketListPapers
https://www.medchemica.com/bucket-list/
2. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Agenda
• Problem statement
• What is Active Learning?
– How can it applied to LI and LO?
• Generating new ideas with MMPA
– Enumeration with MMPA (RuleDesignTM)
• “hit-to-lead” / “AllRules” / 3pairtrans
• Protein class Rule sets
– Permutative-MMPA (Free Wilson ++)
• Getting the best ideas from small data sets
• Regression Forest models for ‘potency’ prediction
– QSAR revisited with transparent descriptors
- Analysis of Error
• Learnings so far
– The system can ‘gets stuck’ at the start…
• ”It’s like the first 8 moves in chess”
3. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Problem Statement
…8 Years of working with pharma companies
“Our median number of compounds per LO project is 3000 - this is
unsustainable… [it should be] 300”
– Director of Chemistry (large pharma)
“Can we define the text book of medicinal chemistry?”
– Director of Comp Chem (large pharma)
“We are aiming at 300 compound per project. Currently we are about 400, we will
get better”
– ExScienta scientist at SCI ‘What can Big Data do for chemistry”
“Can you find us hits [leads] and predict potency on this [brand] new
protein?”
- Many many people….
MedChemica: using knowledge extraction techniques to build Artificial
Intelligence (AI) systems to reduce the time and cost to critical
compounds and candidate drugs.
4. Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Problem Statement
“Can you find us hits [leads] and predict potency on this
[brand] new protein?”
Can we automate Lead compound design?
The algorithm will:-
- design compounds and explore SAR
- ‘actively’ selecting compounds to improve properties
- AND improve the machine learning models
Small
amount of
data
Matched
Molecular Pair
Analysis
Explainable
QSAR
Awesome leads
pIC50 > 7, good in-vitro PK
SAR, Novelty
5. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Augmenting the Medicinal Chemist
Prioritizes
options
Sets goals
Makes
Decisions
Data is organized
and summarized
6. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Augmented Chemists
proposalsRuleDesignTM
Permutative
MMPA
Missing
features
Explainable
QSAR models
Alerts
ideas
Score
and
store
Make &
test
SpotDesignTM
SLIDE 27
7. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Augmenting the Chemist: Lessons so far…
Develop AI constructively
• Use methods that can be directly connected to
chemical structures and data
– SpotDesign™, RuleDesignTM, Permutative
MMPA, Explainable QSAR
• Ensure that all methods are auditable
– See the transformations and underlying data,
see the pharmacophore pairs on molecules
• Automate updates and track metrics
– All systems are automated from the start,
logging is built in
• Integrate automated systems and chemists ideas
Principles for Positive Engagement
• Define common goals
• Evaluate with directly observable
data
• Expose conflicting views
• Continuous learning and
improvement
• Place in context
Chemists: AI Is Here; Unite To Get the Benefits,
Griffen E.J.; Dossetter, A.G.; Leach,A.G; J. Med. Chem. 2020, 63, 16, 8695–8704.
https://doi.org/10.1021/acs.jmedchem.0c00163
8. Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Data
Warehouse
rule
finder
Exploitable
Knowledge
Molecule
problem
solving
Explainable
QSAR
Automated
loader
MMPA
Clean
Structures &
Data
Property
Prediction
Idea ranking
Instant SAR
analysis
REST API &
GUI
Explainable AI for Medicinal Chemistry Design
9. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Griffen, E. et al. J. Med. Chem. 2011, 54(22), pp.7739 - 7750.
Leach et al. J. Chem. Inf. Model. 2017, 57, 2424 - 2436
Fully Automated Matched Molecular Pair Analysis (MMPA)
What is this form of Artificial Intelligence?
Δ Data A-
B1
2
2
3
3
3
4
4
4
12
23
3
34
4
4A B
• Matched Molecular Pairs – Molecules that differ only by a
particular, well-defined structural transformation
• Capture the change and environment – MMPs can be recorded as
transformations from A B
• Statistical analysis to define “medicinal chemistry rules”
Defined transformations with high probability of improving
properties of molecules
• Store in a high performance database and provide an intuitive user
interface
Level 4 and higher very
important to P-MMPA
10. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
A B pSol A (μM) pSol B (μM) ∆pSol
- 4.3(48 μM) - 3.2 (700μM) 1.1
- 6.0 (1.0 μM) - 3.7 (178 μM) 2.3
-5.7 (2.0 μM) - 4.1 (82 μM) 1.6
3 pairs +ve Sol
Median 1.6
CHEMBL1949790CHEMBL1949786
From SAR to MMPA…..
CHEMBL3356658 CHEMBL218767
CHEMBL456322CHEMBL456802
MCPairs Rule finder required 6 matched pairs for 95% confidence
(Al)(Al)
11. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
The Matched Pairs leading to Rule…..
Actual Rule from MCPairs
Endpoint:
Aqueous Solubility at pH 7.4
[CHEMBL2362975]
n-qual 69
n-qual-up 47
n-qual-down 21
median ∆pSol 0.26
std dev +/- 0.636
(Al)(Al)
Explainable
• Drill back to real world
examples and measured data
Actionable
• Clear decision to make the
compound
12. Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Identify and group matching SMIRKS
Calc ulate statistical parameters for eac h unique
SMIRKS(n, median, sd, se, n_up/ n_down)
Is n ≥ 6?
Not enough data:
ignore transformation
Is the | median| ≤ 0.05 and the
interc entile range (10-90%) ≤ 0.3?
Perform two-tailed binomial test on the
transformation to determine the
signific anc e of the up/ down frequenc y
transformation is
c lassified as ‘neutral’
Transformation c lassified as
‘NED’ (No Effec t Determined)
Transformation c lassified as
‘increase’ or ‘ decrease’
depending on whic h direc tion the
property is c hanging
passfail
yesno
yesno
Rule selection
0 +ve-ve
Median data difference
Neutral IncreaseDecrease
NED
• No assumption of normal
distribution
• Manages ‘censored’ =
qualified / out-of-range data
Leach et al. J. Chem. Inf. Model. 2017, 57, 2424 - 2436
13. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Molecule Problem Solving - RuleDesignTM
RuleDesignTM (formally “Compounds From Rules”)
• Exploitable Knowledge is a Rule database derived from MMPA
• User puts in a problem molecule with a property they wish to improve
o e.g. solubility, metabolism, hERG….
• System generates potential improved molecules based on data
Exploitable
Knowledge
Enumerator
System
Problem molecule + property to improve
Solution molecules
Watch RuleDesignTM on YouTube https://www.youtube.com/watch?v=nQxXddJDTfc
“..it’s like asking 150 of your peers for ideas in just a few seconds”
- Principal Scientist (large pharma)
14. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Looking at the results
Results sorted in
increasing RMM
(Mol Weight)
Yellow highlight is
the overlap with
the input
compound
One column per assay
– colour and direction
- LogD decrease, Sol increase
Hyperlink to “Drill
back” to the
original data
15. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
“Multi-Step” transformations
Shibuya Crossing Tokyo
A C
B
E
F
Would you go steps via A -> B -> C
How would you go know to go E -> F
Or go straight there via D
- if the data said it was good?
D
A Turing test for molecular generators
Darren Green D.; et al J. Med. Chem. 2020
https://doi.org/10.1021/acs.jmedchem.0c01148
16. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
How many pairs? – deeper Goal setting
Specific Goal
settings
Non-rules transformations
from pair counts
’All Rules’
– all of the Increase and Decrease Rules for all datasets
– warning output can be large
– not suitable for Excel spreadsheet
‘Hit to Lead’
– most frequent transformations chemists perform
’Min 3 pair Trans’
– all transformations with 3 OR MORE matched pairs
‘Min 6 pair Trans’
– all transformations with 6 OR MORE matched pairs
- Actually Increase, Decrease, Neutral and NED
17. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Broad Rule Sets
• “Rules” for increasing
“potency” are gathered by
MMPA
• Individual assay Rules
(numbers in brackets) are
grouped as a “Broad” Goal
• Example Dopamine Rules
number 3548 (screen shot)
• Therefore new hits for a new
Dopamine target can have
these Rules applied [What
worked in the past?]
18. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Permutative MMPA
• Take all compounds in a data set
• Find all matched pairs & extract
DpIC50 and the transforms between
them
• Aggregate transformations with median
DpIC50 and count of pairs
• Apply all transformations back to
the initial data set (at the most
specific environment level) NO R
GROUP MAPPING REQUIRED !!!
• Predicted pIC50 = substrate pIC50 +
median DpIC50
• Remove existing compounds
• Prioritize new compounds by pIC50
estimate
M1
M2
M3
M4
t1
M5
t1
t1
M*
Internal
Structures
& data
Apply
transforms
New
structures
&
estimated
data
Filter and
prioritize
Extract
transforms
Remove
existing
compounds
19. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Exploit Own or Patent Data
External Patents
& data
Extract
transforms
Apply
transforms
Filter and
prioritize
Internal
Structures &
data
Apply
transforms
New
structures &
estimated
data
Filter and
prioritize
Extract
transforms
Remove
existing
compounds
20. Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Client Oncology PPI project example
• 386 patent compounds analyzed
• 6024 pair relationships found(39% - good
number of MMPs)
• Permutative MMPA process:
• Apply to own series,
• Then filter:
• remove undesirable substructure
• Estimated potency >= 6.5,
• clogP <= 2.5
• 52 suggestions
Measurement =
p(TR-FRET nucleotide exchange assay pIC50) or
estimated pIC50 from seed value + DpIC50
Explainable
• Visible, original real world compounds and
measurement
Actionable
• Prioritises ‘realistic’ next step compounds.
PPIpIC50
cLogP
Molecule suggestions yes no
21. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Regression Forest Models
• Features are acid, base, hydrogen bond
donor, acceptor, hydrophobe, aromatic
attachment, aliphatic attachment and
halogen. Definitions are highly engineered
[SMARTS]
• Feature 1 – topological dist - Feature 2
• Engineered for chemical relevance –
features can be superimposed or directly
linked, e.g. enables a group to be both a
hydrogen bond acceptor and a base
• A bit identifies a pharmacophore pair
e.g. : Aromatic - 3 bonds - Base
• Used as unfolded 360 bit fingerprints
• Regression Forest as ML method
• Build models with 10 fold CV – report
CV-Pearson’s R2 and CV RMSE
• Build RF error model to generate
predicted error for each compound
using the same descriptors
22. Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Feature Definition
Basic Group Atom or group most likely protonated at pH 7.4
Acidic Group Atom or group most likely deprotonated at pH 7.4, includes N and C
acids
Acceptor Definitions derived from Taylor, Cosgrove et al
Donor Definitions derived from Taylor, Cosgrove et al
Hydrophobic C4 or greater cyclic or acyclic alkyl group
Aromatic Attachment connection of any group to an aromatic atom excluding connections
within rings
Aliphatic Attachment connection of any atom to an aliphatic group not in a ring.
Halo F,Cl, Br, I
Reference for Donor acceptor feature definitions:
Taylor, R.; Cole, J. C.; Cosgrove, D. A.; Gardiner, E. J.; Gillet, V. J.; Korb, O. J Comput Aided Mol Des 2012, 26 (4), 451–472.
Acid & Base definitions are SMARTS including C, N, heteroaromatic acids, bases excluding weak aniline bases, including
amidines, guanidine’s - MedChemica definitions.
MedChemica Advanced Pharmacophore Pairs
Gobbi, A.; Poppinger, D. Biotechnology and Bioengineering 1998, 61 (1), 47–54.
Reutlinger, M.; Koch, C. P.; Reker, D.; Todoroff, N.; Schneider, P.; Rodrigues, T.; Schneider, G. Mol. Inf. 2013, 32 (2), 133–138.
23. Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Regression Forest & Pharmacophore understanding
• hERG – auditable models
• Identify important chemical features driving potency
• Predict hERG potency from RF model [10 fold CV]
Pharmacophore fp length 280
10 fold CV
Compounds in training 6196
RMSE 0.37
CV R2 0.51
24. Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Examples of exact Pharmacophore Pairs
HBA-same_group-Base HBA-1_atom-HBD Base-2_atom-Ar
Topological distances are precisely specified and can be exactly visualized on the
molecules – no ambiguity over which features are correlated with activity
Critically – enables interrogation and validation of SAR understanding
Record as an unfolded fingerprint of 360 bits, 1 or 0 for presence or absence of a
feature-distance-feature pair
25. Exploiting medicinal chemistry knowledge to accelerate projects October 2020
• hERG – auditable models
• Predict hERG potency from RF model [10 fold CV]
• Example CHEMBL12713 sertindole
• Colour structure by feature importance
weighted sum of of pharmacophore pair
fingerprints – show the chemists where the
hotspots are.
• Drill deeper to show the most important positive
and negative features. RF prediction pIC50 7.8
median_with: 5.1
median_without: 4.7
median_diff: 0.4
n_examples_with: 4585
n_examples_without : 1383
median_with: 5.1,
median_without: 5.3
median_diff: -0.2
n_examples_with: 3106
n_examples_without : 2862
Regression Forest & Pharmacophore understanding
26. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Explainable – chemists can see the parts of the molecule that count
Explainable
• Highlighted features show the chemist the contribution to the
prediction
Actionable
• Which parts should be optimized to achieve the Goal
Explainable
• Nearest Neighbours show original data on which model is built
Actionable
• What weight do I put on this results? How likely is it? Do we test?
27. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
RF and kNN are good but……
• The models are good but could be great or even superb..
• Analysis of error identifies the exact “functional groups” that are less accurately
predicted
• A feedback loop could design cmpds to improve models testing
• “Either not enough or the wrong sort of data – the downfall of AI in Life Science?” – Dossetter, A.G.
https://www.linkedin.com/pulse/either-enough-wrong-sort-data-downfall-ai-life-al-dossetter/
Using the model RMSE to
estimate error:
78% measured values in
range prediction +/- RMSE
28. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Overview
Generate virtual compounds from MCPairs MMPA
• Hit-to-Lead transformations – the most used medicinal chemistry
• ADMET transformations for metabolism and solubility
• Target class transformations learning from target analogues
• E.g. Dopamine Rule
Regression forest models
• Accurate pharmacophore features with topological distance
• Unfolded fingerprints connect feature importance to
pharmacophores
• Error models give accuracy of prediction for each compound
Active Learning
• Explore Strategy - predicted high potency, high error
• Exploit Strategy - predicted high potency, low error
29. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Active Learning
Hits
Build model
with error
estimates
Enumerate
Select for
Explore and
Exploit
Synthesise &
Test
Compounds
with data
Compounds
meet criteria?
Yes
No
STRATEGIES
Explore: prioritize high error
Exploit : prioritize high potency & low error
Ratio of explore to exploit varies with stage
Select enumeration strategy by stage:
Hit-to lead, target class, solubility,
metabolism
For in silico simulation match to
known and measured compounds
System operational
30. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Active Learning – V1
Challenges:
• How to get started when you only have a few
compounds to model build from
• limited synthesis resource
D2 Case study
• Start with 30 literature compounds :
5 <= pIC50 <=6 , -1 < AlogP < 3.5, selected by
LLE sort (literature contains 5200 compounds)
• Build RF model CV-R2 -0.26, small data set
• Enumerate from all compounds:
• What is the best enumeration strategy?
– how to pick the (few)compounds to make from the
enumerated set?
– Enumeration is a success if we match literature
compounds (very stringent test)
– Have we learnt all that the initial set of compounds
can teach us?
Strategy
(MMPA)
Number of
compounds
generated
Number of
matches to D2
known set
Maximum
pIC50
(actual)
Maximum pIC50
(predicted[error])
Hit-to-Lead 682 10 7.8 5.5[0.21]
Dopamine
class
469 8 7.9 5.5[0.23]
Solubility 10148 10 7.8 5.5[0.21]
Metabolism 12729 19 7.9 5.5[0.21]
Permutative
MMPA
(env = 4)
5 3 7.9 6.1[?]
D2pIC50
cLogP
Round 1…..
31. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
D2 worked example – The p-MMPA
Predicted: pIC50 6.1, actual pIC50 7.9
Finding all the MMP SAR that is present and
applying it exhaustively including behind the
Pareto frontier.
D2pIC50
cLogP
32. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Active Learning v2
System under development
Hits
Compounds
with data
P-MMPA Under
Dev
Compounds
with data
Build model
with error
estimatesEnumerate
Select for
Explore and
Exploit
Synthesise &
Test
Compounds
meet criteria?
Yes
No
Explore: prioritize high error
Exploit : prioritize high potency & low error
Ratio of explore to exploit varies with stage
Enumerate by:
target class,
solubility,
metabolism
Compounds
with data
Need initial “induction phase” before cyclic
automated active learning can be applied
33. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Like the opening in chess game
• “The first moves of a chess game are
termed the "opening" or "opening
moves". A good opening will provide
better protection of the King, control
over an area of the board (particularly
the centre), greater mobility for pieces,
and possibly opportunities to capture
opposing pawns and pieces.” A Beginner's
Garden of Chess Openings - David A. Wheeler
• Success or failure of an
automated active learning
system could be like the first few
moves of a chess – they shape
the game…
• Will it always need a human
intervention (or ten…)? …set up for either Queen’s Gambit, King’s Indian Defense,
Nimzo-Indian, Bogo-Indian, Queen’s Indian Defense, and
Dutch Defense.
34. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Learning from First Experiments….
• MMPA and RF work together to suggest and rank compound designs
• Strategies explored
– Explore: prioritize high error
– Exploit : prioritize high potency & low error
• Ratio of explore / exploit varies with stage
• The initial phase from a small number of hits is a challenge
– Hit-to-Lead / ADMET Rules did not match compounds in literature
– Victims of what is published
– Requires full datasets
– Process can get “stuck”
• Human intervention may always be required
• Both MMPA and RF can select compounds to make to improve models –
analysis of error.
• Permutative-MMPA works very well (of course)
• Where AI could help is a compound selector depending on strategy
35. Exploiting medicinal chemistry knowledge to accelerate projects October 2020
• Dr Alexander G. Dossetter
• Managing Director, MedChemica Ltd
• al.dossetter@medchemic.com
• MedChemica
• Lauren Reid
• Jessica Stacey
• Phil De. Sousa
• Shane Montague
• Edward J. Griffen
• Andrew G. Leach
• Available on Slideshere - search for Dossetter
• Twitter @MedChemica
• Twitter #BucketListPapers
• https://www.medchemica.com/bucket-list/
Thank you
36. Exploiting medicinal chemistry knowledge to accelerate projects October 2020October 2020
Not for Circulation
About MedChemica
>10 experience in building A.I. Systems for drug discovery
38. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
• Founded in 2012 by AZ AP Medicinal / Computational chemists
to accelerate drug hunting by exploiting data driven knowledge
• Domain leaders in SAR knowledge extraction and knowledge
based design
• > 11 years experience of building AI systems that suggest
actions to chemists (7 years as MedChemica)
• Creators of largest ever documented database of medicinal
chemistry ADMET knowledge
MedChemica Publications
39. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
AI Software Platforms
– Complete In-house platform
– Analysis of own data and automated
updating
– Design tool access to all chemists
– Custom fitting (Software-as-a-Service)
One stop GUI
Design tool
Biotech, Universities and
Foundations
Medium to large pharma,
agrochemical and materials
research
– Secure web-based AI design platform
– CHEMBL, Patent data analysed
– Merged into one knowledgebase
40. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020
Science As A Service (SaaS)
Target ID
Hit
Screening
Lead Identification Lead Optimisation Pre-Clinical
AI H2L design
sets
Bespoke Advanced Analytics and Computational Chemistry services through-out the research phase
Compound design to
solve ADMET and
potency issues
Third party
compound
assessment
Directed virtual screening
for hit matter
Library design for novel
protein targets
AI Toxophore
assessment
Patent analysis
Pharmacophore
profiling
Generating IP for
clients
[Scaffold hops]
Collection
evaluation
and
enhancement
41. Exploiting medicinal chemistry knowledge to accelerate projects October 2020
October 2020
Not for Circulation
Panel Discussion:
What should the Medicinal Chemistry Discipline be
like in 10 years?
Slideshere - search for Dossetter
Twitter @MedChemica
Twitter @covid_moonshot
Twitter #BucketListPapers
https://www.medchemica.com/bucket-list/
Notas del editor
Visualisations are anonymised data from an active client project.
Feature definitions are pairs from Taylor and Cosgrove
With the addition of a halogen class, distances are topological distance, binary fingerprints not scalar counts of number of matches.
Feature importance is permutative importance not impurity
Feature definitions are pairs from Taylor and Cosgrove
With the addition of a halogen class, distances are topological distance, binary fingerprints not scalar counts of number of matches.
Feature importance is permutative importance not impurity
Everyone wants to be able to spot the weak points in a model so it can be improved.
Here because we can identify where the under explored regions of pharmacophore space are, we can choose to bias our ‘explore’ synthesis and testing to improving the model in a transparent and verifiable way.
As we are using the precise pharmacophore definitions and Random Forest modelling this means that understanding where to focus attention is straightforward.
We can generate good compounds from enumeration – the problem is how to rank them, if we generate a lot of compounds then the initially generated model is not sufficiently discriminating? Generating lots of compounds is not the solution initially! Enumerating from HtL transformation or class transformations – is better, but the best approach is to first make sure you’ve got the most out of the data you already have – permutative MMPA.
In the D2 example, the m-OMe o-OH transformation if applid to the propyl compound gives a 1.6log increase in potency (mknown measured compound not in training set).
Note the env = 4 is only using env 4 transformations from MCPairs – so we only transfer exact SAR, nogenerically pepper the compounds with all the substituents eg just m-Cl not all the Chloros.
We can generate good compounds from enumeration – the problem is how to rank them, if we generate a lot of compounds then the initially generated model is not sufficiently discriminating? Generating lots of compounds is not the solution initially! Enumerating from HtL transformation or class transformations – is better, but the best approach is to first make sure you’ve got the most out of the data you already have – permutative MMPA.
In the D2 example, the m-OMe o-OH transformation if applid to the propyl compound gives a 1.6log increase in potency (mknown measured compound not in training set).
Note the env = 4 is only using env 4 transformations from MCPairs – so we only transfer exact SAR, nogenerically pepper the compounds with all the substituents eg just m-Cl not all the Chloros.
'"under dev’ covers MMS and extensions. It’s where Andy Bell at Ex Scienta comes in I think.
You might want to put more of the team on the Thank you slide:
E. Griffen, A. Leach, A. Lin, J. Stacey, L. Reid, S. Montague, P De Sousa.